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ABSTRACT 


This  investigation  considers  three  closely  related  problems:  the 
optimum  filtering  of  stationary  or  near- stationary  random  processes  with 
unknown  parameters  from  an  infinite  parameter  set;  estimation  of  the 
state  of  a  linear  discrete  dynamical  system  with  nongaussian  noisy  inputs; 
and  applications  of  state  estimation  theory  to  detection.  Hie  form  of 
the  optimum  filter  when  !:he  parameters  are  unknown  is  found  to  have 
weights  that  are  averages  of  simple  functions  of  the  signal  and  noise 
spectra  averaged  over  the  parameter  space.  Practical  methods  for 
implementation  are  given.  The  key  problem  in  nonlinear  state- variable 
estimation  is  obtaining  the  joint  density  of  the  states  and  the  observa¬ 
tions  in  a  convenient  form.  This  problem  is  solved,  and  surface  search¬ 
ing  is  used  to  find  the  mode.  The  number  of  dimensions  of  the  surface 
is  the  same  as  the  order  of  the  dynamical  system.  A  new  approach  to 
linear  state  estimation  is  given;  and  this  theory  iB  applied  to  the 
problem  of  detecting  a  gaussian  signal  in  gaussian  noise.  A  time- 
invariant,  near-optimum  detector  of  small  dimensions  is  derived. 
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I .  INTRODUCTION 


A.  OUTLINE  OF  THE  PROBLEM 

This  investigation  concerns  the  optimal  estimation  or  detection  of 
a  sampled,  vector-valued  stochastic  process  that  may  be  generated  by  a 
noisy  discrete,  linear,  dynamical  system.  The  system  inputs  are  a 
sequence  of  independent  random  variables,  i.e.,  white  noise.  The  system 
output  is  corrupted  with  additive  white  noise.  In  the  first  part  of 
this  report  the  white  noise  is  assumed  to  be  gaussian  (linear  estimation 
is  optimum);  later,  nongaussian  inputs  and  output  noise  are  assumed  (in 
general,  nonlinear  estimation  will  be  optimum).  Hie  stochastic  processes 
may  or  may  not  be  stationary  and,  for  most  of  the  report,  the  process 
parameters  will  be  assumed  to  be  known  a  priori.  In  Chapter  V,  however, 
consideration  is  given  to  the  important  problem  of  estimating  the  value 
of  the  process  when  it  is  stationary  or  nearly  stationary  and  when  the 
parameters  are  assumed  to  come  from  some  infinite  set  with  some  a  priori 
distribution. 

In  this  analysis,  the  word  "estimation"  will  mean  either  filtering 
or  interpolation.  An  optimum  estimate  is  defined  as  one  that  minimizes 
a  generalized  mean- squared- error  performance  criterion  or  maximizes  a 
conditional  density.  Filtering  is  defined  as  the  estimation  of  a  present 
value  conditioned  on  all  the  past  data.  Frequently,  the  term  "smoothing" 
is  used  in  place  of  Interpolation  or  estimation  of  a  past  value  condi¬ 
tioned  on  all  data  to  the  present. 

Engineering  examples  of  the  above  processes  are  listed  below.  An 
Important  example  is  a  space  vehicle  in  orbit.  The  equations  of  motion 
when  linearized  correspond  to  a  linear  dynamical  system.  The  atmospheric 
drag  may  be  represented  as  a  noisy  input.  Range  and  velocity  of  the  space 
vehicle  are  measured  over  a  noisy  radio  channel.  This  channel  noise 
constitutes  the  output  noise.  Usually,  the  noise,  as  it  appears  to  the 
velocity-  and  range- measuring  equipment,  is  nongaussian. 

An  equally  important  example  of  this  type  of  system  is  a  rocket  under 
power.  The  noise  inputs  are  caused  by  random  variations  in  motor  thrust 
amplitude  and  direction.  The  trajectory  information  is  also  transmitted 
over  a  noisy  radio  channel. 
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An  example  of  a  stationary  process  with  unknown  parameters  from  an 
infinite  set  is  a  satellite  or  space  vehicle  transmitting  on  an  unknown 
frequency  due  to  uncertainty  in  the  doppler  shift  or  drift  in  transmitter 

•  a 

frequency.  Determining  and  tracking  this  frequency  are  the  central 
problems  in  space  communications.  For  most  types  of  modulation,  tech¬ 
niques  similar  to  those  discussed  in  Chapter  V  provide  by  far  the  best 
solution  known.  An  example  of  a  near- stationary  process  is  a  signal  with 
known  parameters  and  unknown  jamming,  where  the  jamming  corresponds  to 
an  unknown  output  noise.  Use  of  the  moon  as  a  passive  reflector  for 
communications  from  one  earth  point  to  another  is  another  example  of  a 
near- stationary  process.  We  might  also  include  in  this  class  a  tracking 
antenna  system  using  conical  scan.  Here,  the  system  gain  is  directly 
proportional  to  the  unknown  signal  strength  which  slowly  varies. 

B .  PREVIOUS  WORK 

Kalman  and  Koepcke  in  their  pioneering  work  [Ref.  l]  have  considered 
the  optimal  filtering  and  prediction  of  sampled  gauss-markov  stochastic 
processes  when  the  parameters  of  the  process  are  known.  Rauch  [Ref.  2] 
has  extended  this  analysis  to  include  interpolation  when  the  input  noise 
is  gaussian,  when  there  is  no  output  noise,  and  when  the  parameters  are 
a  sequence  of  independent  random  variables  with  known  means  and  variances. 
Widrow  [Ref.  3]  and  Gabor  et  al  [Ref.  4]  have  independently  investigated 
and  constructed  systems  that  adapt  by  using  a  noise-free  sample  of  the 
signal.  Since  in  many  practical  situations  the  noise-free  sample  will 
not  be  available,  this  type  of  adaption  was  not  considered  in  this 
investigation. 

Magi 11  [Ref.  5]  has  used  the  Hilbert  space  approach  (approximately 
concurrently  with  this  investigation)  to  simplify  the  derivation  of  the 
filter  and  interpolation  of  gausa-markov  processes.  He  has  also  given 
the  form  of  the  optimal  estimate  of  a  gauss-markov  process  when  a  finite 
set  of  parameters  is  distributed  in  general  according  to  some  arbitrary 
density.  Cox  [Ref.  6]  discusses  state-variable  estimation  of  nonlinear 
systems  with  gaussian  inputs.  His  approach  involves  a  system  linearization. 
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Work  in  this  investigation  contains  an  extension  of  Magill's  adaptive 
estimation  for  a  finite  number  of  parameters  to  estimation  where  the 
parameters  may  come  from  an  infinite  set.  The  theory  of  nonadaptive 
estimation  is  extended  to  include  dynamical  systems  with  nongaussian 
inputs  and  output  noise. 

C.  OUTLINE  OF  NEW  RESULTS 

This  investigation  gives  solutions  to  three  closely  related  problems: 

1.  The  theory  of  adaptive  estimation  is  extended  to  stationary  or 
near-stationary  processes  with  parameters  from  some  infinite 
parameter  set. 

2.  The  theory  of  nonadaptive  estimation  is  extended  to  include 
nonlinear  filtering  and  interpolation  of  the  state  variables  of  a 
dynamical  system  excited  by  nongaussian  random  inputs. 

3.  Methods  are  given  for  greatly  simplifying  the  optimum  detection 
procedures  when  the  signal  can  be  considered  as  the  output  of  a 
linear  dynamical  system  excited  by  random  noise. 

Chapter  II  contains  a  description  of  the  two  main  mathematical  methods 
of  system  description  that  are  used  in  this  report.  This  chapter  also 
contains  a  detailed  description  of  the  random  process. 

In  Chapter  III  a  new  approach  to  the  linear  estimation  of  the  state 
variable  of  a  discrete  linear  dynamical  system  is  presented  which  shows 
the  close  relationship  between  state-variable  estimation  and  pattern 
recognition.  The  estimation  theory  in  Chapter  III  was  inspired  by,  and 
is  a  straightforward  extension  of,  the  pattern-learning  theory  developed 
by  Abramson  and  Braverman  [Ref.  7].  In  both  cases,  it  is  desired  to 
learn  the  conditional  mean  of  a  gaussian  vector-valued  random  variable. 

The  equations  for  the  mean  and  the  covariance  matrices  derived  by 
Abramson  and  Braverman  are  almost  identical  to  the  filtering  equations  of 
Chapter  III.  The  author  feels  that  the  new  approach  is  far  simpler  and 
gives  a  greater  intuitive  insight  than  other  methods  that  have  been 
suggested  earlier. 

The  chapter  also  provides  necessary  background  for  the  three  chapters 
that  follow.  A  new  problem  is  solved  in  Sec.  C:  estimation  of  state 
variables  when  the  estimates  are  taken  through  a  second  dynamical  system 
(as  will  very  often  be  the  case).  Equations  derived  by  Magill  may  be 
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used  to  make  these  estimates;  however,  the  order  of  Magill's  matrix 
equations  can  be  twice  as  great  as  those  presented  here.  If  a  large 
amount  of  data  is  processed,  the  reduction  in  calculation  could  be 
significant . 

Chapter  IV  applies  the  theory  of  linear  state-variable  estimation 
to  the  problem  of  detecting  a  gaussian  signal  immersed  in  additive 
gaussian  noise.  It  is  shown  that  a  long-standing  conjecture  about  the 
possibility  of  simplifying  the  detection  procedure  is  often  true.  In 
Kailath's  solution  [Ref.  8],  the  optimum  detector  contains  an  operator 
that  gives  the  best  estimate  of  each  signal  sample  value  during  the 
detection  interval  based  on  all  the  data  observed  during  the  interval. 

If  several  thousand  data  points  are  observed,  a  matrix  of  the  same  order 
must  be  inverted.  When  the  signal  can  be  represented  as,  or  approximated 
by,  the  output  of  a  noisy  dynamical  system,  the  estimation  equations  of 
Chapter  III  may  be  applied  directly.  The  matrices  to  be  inverted  will 
be  no  larger  than  the  order  of  the  dynamical  system  regardless  of  the 
number  of  data  points.  Further  simplification  results  if  the  estimator 
of  a  sample  value  is  truncated  when  the  error  covariance  matrix  shows 
that  there  will  be  little  reduction  in  mean-squared  error  by  conditioning 
on  additional  data  points.  A  near-optimum  time-invariant  detector  is  then 
shown  to  exist.  The  advantages  of  time- invariant  circuitry  when  analog 
networks  are  used  cannot  be  overemphasized.  Time-variable  analog 
networks  of  the  complexity  required  in  this  problem  are  extremely 
difficult  and  expensive  to  build.  This  form  of  detector  is  the  form 
most  convenient  to  Instrument,  using  the  newly  developing  and  powerful 
methods  of  optical  data  processing. 

Chapter  V  describes  optimum  estimation  when  the  process  is  stationary 
or  near  stationary  and  when  the  process  parameters  are  unknown  but  may 
assume  any  one  of  an  infinite  number  of  possible  values.  The  term  "near- 
stationary"  is  used  rather  loosely.  In  practice,  the  filter  will  adapt 
so  quickly  that  good  results  may  be  obtained  on  processes  many  persons 
would  call  highly  nonstationary. 

The  estimator  weights  are  functions  of  the  parameters,  and  it  is  shown 
that  the  optimum  estimator  is  formed  by  taking  the  expected  value  of 
the  weights  over  the  parameter  space  conditioned  on  the  observed  values. 
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It  is  then  shown  that  the  parameters  enter  into  the  weights  as  functions 
of  the  signal  and  noise  spectrum  in  a  very  simple  manner.  The  optimiza¬ 
tion  procedure  thus  involves  learning  these  spectral  functions  conditioned 
on  the  data. 

It  is  believed  that  this  device  will  have  numerous  applications  in 
the  field  of  space  communications.  As  mentioned  in  Sec.  A,  a  fundamental 
problem  is  the  frequency  tracking  of  a  narrowband  signal.  The  usual 
approach  is  to  use  frequency  modulation  or  to  insert  an  unmodulated 
carrier.  In  either  case  a  phase-locked  loop  may  be  locked  onto  the 
carrier  and  used  as  a  frequency  reference  for  a  narrowband  filter. 

Frequency  modulation  very  often  is  not  the  best  way  to  modulate,  nor 
does  the  inserted  carrier  contain  information,  and  thus  their  use  lowers 
tho  system  signal- to-noise  ratio  for  a  fixed  transmitter  power.  In 
electronic  surveillance  work,  the  opponent  is  hardly  ever  considerate 
enough  to  include  a  tracking  carrier!  One  of  his  favorite  tactics  is 
to  shift  his  transmitter  frequency  in  a  manner  unknown  to  the  receiver. 

Such  "carrierless"  situations  show  the  filter  of  Chapter  V  off  to  good 
advantage  since,  unlike  the  phase- locked  loop,  it  will  automatically 
center  itself  about  the  signal  in  the  form  of  a  narrowband  filter. 

Chapter  VI  discusses  nonlinear  estimation  (including  the  best  nonlinear 
predictor)  of  state  variables  of  linear  systems  with  nongaussian  inputs  or 
output  noise.  With  the  exception  of  the  types  cf  distributions,  the  model 
of  the  process  is  identical  to  that  used  for  linear  estimation.  First, 
there  is  a  proof  of  the  necessary  and  sufficient  conditions  for  the  dis¬ 
tribution  of  the  state  variables  to  converge  to  the  gaussian.  The  esti¬ 
mates  found  in  Chapter  VI  are  either  Bayesian  or  maximum  likelihood,  and 
the  key  problem  is  finding  the  conditional  density  in  a  convenient  form. 

The  Markov  property  of  the  state  variables  is  used  to  simplify  this 
rather  complex  density  and  then  surf ace- searching  techniques  are  used  to 
find  the  mode.  An  important  result  is  proof  that  near-optimum  (linear  or 
nonlinear)  estimates  of  the  state  of  many  dynamical  systems  do  not 
require  conditioning  on  all  available  data,  but  may  be  made  using  only  a 
short  sequence  of  observations.  The  length  of  this  sequence  may  be  related 
directly  to  the  rate  of  decay  of  initial  conditions  in  the  dynamical 


system.  The  saving  in  computation  time  may  be  very  significant.  The 
dimensions  of  the  surface  to  be  searched  are  the  same  as  the  order  of 
the  dynamical  system. 

Use  of  this  theory  is  envisioned  in  a  situation  such  as  the  one 
given  below.  Much  of  the  ballistic  missile  work  at  Cape  Kennedy  is 
concerned  with  measurement  of  missile  accuracy.  The  trouble  is  that  the 
external  ground-based  measuring  equipment  is  no  more  accurate  than  the 
missile  guidance  and  thus  cannot  offer  any  real  check  on  the  trajectory. 
Any  increase  in  guidance  accuracy  will  completely  swamp  the  measuring 
equipment.  (The  seriousness  of  the  problem  has  prompted  the  government 
to  issue  a  large  contract  for  range  modification,  although  it  is  the 
opinion  of  many  that  the  point  of  diminishing  returns  in  measurement- 
equipment  accuracy  has  already  been  passed.)  In  data  reduction  the 
standard  procedure  is  to  make  a  linear  least-mean- squared  estimate. 

Since  it  is  known  that  the  statistics  are  nongaussian,  nonlinear  estima¬ 
tion  may  offer  a  possibility  of  significant  improvement.  The  asymptotic 
behavior  of  the  estimator  as  the  output  noise  decreases  is  also  discussed. 
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II.  STATEMENT  OF  TOE  PROBLEM  AND  MODEL  OF  TOE  PROCESS 


It  is  desired  to  form  an  estimate  of  a  sampled-data,  random-message 
process  corrupted  by  additive  noise.  The  random  message  or  the  additive 
noise  or  both  may  be  nongaussian.  The  observable  process  (the  process 
from  which  the  estimations  are  made)  is  assumed  to  be  sampled,  either 
in  scalar  or  vector  form,  and  for  most  of  this  report  is  assumed  to  be 
generated  by  processes  with  known  statistics.  In  Chapter  V,  however, 
it  is  assumed  that  the  estimates  are  made  of  a  process  with  unknown 
parameters  and  that  these  parameters  may  come  from  an  infinite  set. 


A.  STATE-VARIABLE  AND  SAMPLE-VALUE  REPRESENTATIONS  OF  DISCRETE 
LINEAR  SYSTEMS 


Two  mathematical  methods  for  describing  linear  discrete  dynamical 
systems  are  used  in  this  report.  The  first  is  called  the  state-variable 
representation,  and  the  second  is  known  as  the  sample-value  representation. 
Since  many  engineers  are  familiar  with  one  or  the  other,  but  not  with 
both,  the  methods  are  discussed  briefly  in  this  section. 

Consider  the  transfer  function 


o(z) 


,-2 


-1. 


Y(Z) 

.-is  "CTT 


(2.1) 


(1  -  aZ  )(1  -  bZ  A)  ~2' 

where  u^(z)  is  a  noisy  control  input  and  Y(Z)  is  the  output.  A  block 
diagram  having  this  transfer  function  is  shown  in  Fig.  1.  The  input 
u.(k)  is  a  second  input  of  noise  alone.  Let  x  (k)  be  the  value  (or 
state)  of  the  output  of  the  right-hand  delay  at  the  k  sampling 
instant,  and  let  x2(k)  be  the  value  at  the  output  of  the  other  delay. 
Then  the  state  or  state  vector  of  the  system  is  defined  as 


X(k) 


"*1(k) 

x2(k)_ 


(2.2) 


In  general,  any  sampled-data  transfer  function  may  be  reduced  to  block 
diagram  form  with  feedback  around  delays,  and  a  state  vector  may  be  defined 
with  the  values  of  the  delay  outputs  at  the  sampling  instants  as  vector 
elements. 
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u,(k-t)» 


CONTROL 

INTPUTc 

Uj(k-I) 


'Y(k) 


FIG.  1.  AN  EXAMPLE  OF  A  DISCRETE  LINEAR  SYSTEM. 

The  vector  X(k)  may  be  found  from  a  set  of  difference  equations 
that  may  be  written  as 


X(k)  »  OX(k-l)  +  ru(k-l)  (2.3a) 

The  matrix  $  is  known  as  a  "transition  matrix"  and  is  m  x  m  where  m 

o  o  o 

is  the  order  of  the  system.  This  matrix  may  be  time  variable,  but  to 

simplify  notation,  its  argument  will  not  be  carried  along.  The  ij 

element  of  $,  t  ,  is  the  gain  between  the  output  of  the  1th  delay 
*  J  £|| 

and  the  input  to  the  j  delay.  The  "input  matrix”  r  is  also  m  xm 

o  o 

and  it  determines  where  the  input  vector  U(k-l)  is  applied  to  the 
system. 

The  system  output  may  be  a  vector  or  it  may  be  scalar,  and  it  usually 
is  a  linear  combination  of  the  states.  The  output  can  be  written  as 

Y(k)  -  HX(k)  (2.3b) 

where  H  is  a  q  \  aQ  matrix  (vector  outputs),  with  q  the  number  of 
outputs.  The  H  matrix  also  may  be  time  variable.  The  system  equations 
for  the  system  of  Fig.  1  will  then  be 
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X(k)  = 


'a  1* 

x1(k-l)' 

+ 

'i  o" 

u1(k-l) 

0  b 

x2(k-l) 

c  1 

u2(k-l)_ 

Y(k)  =  [1  0] 


t1(k) 


LX2(k). 


(2.4a) 


(2.4b) 


The  second  mathematical  description  will  be  used  when  the  input  and 
the  output  of  the  dynamical  system  are  both  scalar.  The  output  of  a 
linear  system  is  a  linear  combination  of  the  inputs,  or 

k 

c(k)  =  Sj(k)  r(k-j)  (2.5) 

j=0 

where  r(k)  is  the  input  and  c(k)  is  the  output.  The  system  may  be 
thought  of  as  a  tapped  delay  line  similar  to  the  one  shown  in  Fig.  12 
of  Appendix  C.  If  r(t)  is  a  bandlimited  continuous  function  with 
value  r(k)  at  the  k  sampling  instant,  and  if  the  time  between 
samples,  T,  is  less  than  l/2B,  where  B  is  the  bandwidth,  the  well- 
known  sampling  theorem  [Ref.  9]  shows  that  there  is  a  one-to-one  cor¬ 
respondence  between  the  sequence  r(k),  k  =  0,1,..., k  ,  and  r(t)  over 
the  interval  [0,  kQT] .  There  is  also  a  one-to-one  correspondence 
between  the  sequence  of  c(k)  and  c(t)  over  the  same  interval.  For 
a  time- invariant  system,  the  »j(k)  are  the  coefficients  in  the  series 
expansion  of  the  system  Z-transform. 

If  the  r(k),  k  =  0,1,..., k  are  the  (kQ+l)  elements  of  a  vector 
called  the  "input  vector,"  and  if  the  c(k),  k  =  0,l,...,k  ,  are  the 
(kQ+l)  elements  of  the  "output  vector,"  the  two  vectors  are  related  by 
the  following  matrix  equation.  (The  square  matrix  will  be  called  the 
"transfer  matrix.") 


'c(O)  * 

■a0(°) 

r(0)‘ 

c(l) 

•jd)  «0U) 

r(l) 

• 

= 

a2(2)  ax(2)  aQ(2) 

; 

c(ko) 

“k  (ko) 
o 

*0‘ko> 

r(ko) 

m  m 
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B.  MODEL  OF  TOE  PROCESS 


The  message  process  will  be  generated  by  random  inputs,  U(k),  to 
the  system  shown  in  Fig.  2.  It  is  assumed  that  U(k)  is  independent  of 
u(j )  for  j  ^  k.  The  element  u^(k)  is  also  assumed  to  be  independent 
of  u^(k)  for  l  ^  i. 


FIG.  2.  TOE  DYNAMICAL  SYSTEM. 

Much  use  will  be  made  in  this  report  of  a  characteristic  of  X(k) 
called  the  (strict)  "Markov  property."  This  property  is  defined  by  the 
relationship 


F[X(k)|X(l),...,X(k-l)]  =  F[X(k)|x(k-l)]  (2.8) 

where  F(*)  is  the  cumulative  distribution  function  (cdf).  In  other 
words,  the  density  of  X(k)  given  all  the  X(j)  up  to  X(k-l)  is  the 
same  as  the  density  conditioned  only  on  X(k-l).  From  Eq.  (2.3a)  it  is 
seen  that  if  X(k-l)  is  given,  then  the  only  random  variable  on  the 
right  is  U(k-l).  Since  U(k-l)  is  independent  of  U(j)  for  j  ■  k, 
the  density  of  X(k)  is  entirely  determined  by  X(k-l)  and  u(k-l). 

If  U(j)  is  a  gaussian  random  variable,  the  process  is  known  as  a 
"gauss- markov"  process. 

The  message  process  is  assumed  to  be  transferred  over  a  physical 
channel  (such  as  a  telemetering  link)  and  noise  will  be  added.  In  our 
model,  this  noise  is  indicated  by  the  vector  W(k),  and  Eq.  (2.3b)  will 
be  modified  to 

Y(k)  =  HX(k )  +  W(k )  (2.3c) 
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Estimations  of  X(j)  will  be  made'  by  observing  (the  observable  process) 
the  sequence  Y(l) , . . . ,Y(k) . 

Both  U(k)  and  w(k)  are  assumed  (without  loss  of  generality)  to 
have  zero  mean.  The  vectors  U(k)  and  W(j)  will  be  independent  for 
all  j  and  k,  and  w(k)  will  be  independent  of  W(j)  for  j  ^  k. 

Two  types  of  covariance  matrices  are  used  frequently  in  this  study. 
The  first  is  called  a  "state-variable  covariance  matrix"  and  is  denoted 
by  the  letter  K.  For  example, 


E[X(k)  X(k)1]  =  Kx(k) 

(2.9a) 

E[Y(k)  Y(k)t]  =  Ky(k) 

(2.9b) 

B[W(k)  W(k)t]  « 

(2.9c) 

where  superscript  t  means  the  transpose. 

Die  second  type  of  covariance  matrix  is  called  the  "time-series 
covariance  matrix."  An  example  would  be 


K 


S 


[■(1) . »(k)] 


(2.10) 
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III.  OPTIMUM  LINEAR  SMOOTHING  AND  FILTERING 


This  chapter  discusses  linear  smoothing  and  filtering.  The 
estimators  will  be  derived  by  assuming  that  the  input  random  process 
and  the  output  noise  are  gaussian.  So,  the  first  section  will  contain 
a  brief  discussion  of  linear  filtering  of  nongaussian  processes  with 
either  gaussian  or  nongaussian  output  noise.  Section  A  also  contains 
a  comparison  of  the  Bayesian  and  the  maximum  likelihood  estimation  of 
state  variables.  Section  B  contains  the  derivations  of  the  filtering 
and  smoothing  routines.  In  many  practical  applications,  the  observations 
will  be  made  through  a  second  dynamical  system,  and  Section  C  contains 
a  derivation  of  the  required  modifications  of  estimating  procedures. 

A.  LINEAR  FILTERING 

The  two  classes  of  estimators  to  be  considered  in  this  report  are 
Bayes  estimators  end  maximum  likelihood  estimators. 

Definition:  A  Bayes  estimator  X(j )  is  one  that  minimizes  the 
expected  risk, 

p(x)  =  j  L[X(  j ) ,X( J ) ]  P[X( j ) | Y( 1 ) . Y(k)]  dX( J ) 

where  L[X(j),X(j)]  is  the  loss  and  X(j)  is  the  estimate  of  X(j) 
given  Y(l) , . . . ,Y(k) . 

If 


L[X(j),X(j)]  =  |X(J)  -  X(j)| 2 


(3.1a) 


it  can  easily  be  shown  that 

X(J)  =  E[X(j)!Y(l) . Y(k) ]  (3.1b) 

The  loss  function  for  the  Bayes  estimate  will  always  be  that  defined  in 
Eq.  (3.1a),  i.e.,  minimizing  the  mean-squared  error. 
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Definition:  The  maximum  likelihood  estimate  of  X(j)  is  the  X(j) 
that  maximizes  P[X(j),Y(l) . Y(k)]. 

It  is  frequently  convenient  to  find  this  maximum  by  setting  the 
gradient  of  in{P[X( j ) j Y( 1 ) , . . . ,Y(k) ] }  with  respect  to  X(j)  equal  to 
zero.  The  elements  of  the  gradient  vector  are  known  as  "likelihood 
functions. " 

If  P[X(  J )  |  Y ( 1 ) . Y(k ) ]  is  symmetric  about  the  highest  mode,  the 

conditional  mean  and  the  maximum  likelihood  estimate  coincide.  Such 
conditions  exist  if  X(j)  and  W(j)  are  gaussian. 

In  Sec.  B,  the  Bayes  estimators  for  gaussian  inputs  are  seen  to  be 
linear.  Now,  if  the  actual  states  are  taken  from  the  dynamical  system 
and  subtracted  from  the  corresponding  estimator  output  to  obtain  the 
error,  the  system  from  the  noise  inputs  to  the  error  outputs  is  still 
linear.  Then,  the  error  covariance  matrix  will  be  just  a  linear  trans¬ 
form  of  the  input  covariance  matrix  plus  a  linear  transform  of  the  output 
noise  covariance  matrix.  The  mean-squared  error  is  the  trace  of  the 
error  covariance  matrix.  In  other  words,  the  mean-squared  error  of  the 
filter  that  is  optimum  for  gaussian  input  and  output  noise  is  a  function 
only  of  the  input  and  output  noise  covariance  matrices.  If  a  filter  is 
designed  to  be  optimum  for  a  gaussian  input  process  with  a  given  covariance 
matrix,  the  mean-squared  error  of  this  filter,  when  the  input  is  non- 
gausslan  with  an  identical  covariance  matrix,  will  be  the  same  as  the 
gaussian  input  mean-squared  error. 

In  Chapter  IV  (for  example)  it  is  seen  that  if  one  asks  the  question, 
What  is  the  linear  circuit  that  will  give  the  minimum  mean-squared  error? 
the  answer  is  found  to  be  a  function  Of  the  input  variance  and  covariance 
only.  Then,  the  filters  derived  in  this  chapter  are  the  best  linear  ones 
for  all  densities  with  the  same  covariance  matrix.  If  any  improvement 
is  to  be  gained  for  nongausslan  processes,  it  necessarily  will  be  a 
nonlinear  operation.  Nonlinear  smoothing  and  filtering  are  discussed 
in  Secs.  B  and  C,  Chapter  VI. 
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B.  OPTIMUM  LINEAR  FILTERING  AND  SMOOTHING 


In  this  section,  the  optimum  filter  is  first  derived.  Next,  the 
optimum  smoothing  routine  is  found  and  it  will  be  seen  that  it  contains 
filtering  as  a  subroutine.  The  model  is  shown  in  Fig.  2  with  W(k)  and 
U(k)  gaussian.  The  filtering  problem  then  is  to  estimate  X(k)  given 
all  values  of  Y  up  to  Y(k).  As  noted  in  the  last  section,  the 
optimum  estimate  is  the  conditional  mean  of  the  quantity  to  be  estimated. 
In  other  words,  if  it  is  desired  to  estimate  X(k),  write  the  density 

P[X(k)| Y(k) ,Y(k-l) . Y(l)] ,  and  find  its  mean.  Later,  it  will  be 

shown  that  this  mean  is  identical  to  the  mean  of  P[X(k)| Y(k) ,x(k-l)] , 
where  X(k-l)  is  the  estimate  of  X(k-l)  given  Y( 1 ) , . . . ,Y(k-l ) .  So 
our  problem  is  reduced  to  finding  this  second  mean. 

Using  Bayes’  theorem,  write 

P[X(k)]Y(k),X(k-l)]  ■  WtoW*)., *(*7.1)1  ffij-Q.I  (3.2) 

Pfx(k-l) ,y(k) ] 

Since  this  density  is  gaussian,  the  mean  will  be  the  value  of  X(k)  that 
minimizes  the  exponent  of  the  density. 

Note  that  x(k)  is  contained  only  in  P[Y(k) | X(k) ,X(k-l ) ]  and 
P[X(k) | X(k-l) ] .  Referring  to  Fig.  2,  it  is  seen  that 


P[Y(k)|x(k),X(k-l)]  =  P[Y(k ) | X(k ) ] 


N[HX(k),Kw(k)]  (3.3) 


where  is  the  covariance  matrix  of  W(k);  [N(A,B)  denotes  a 

normal  density  with  a  mean  vector  A  and  covariance  matrix  B].  The 
vector  X(k-l)  is  the  mean  value  of  X(k-l)  given  Y(k-1 ) , . . . ,Y( 1 ) . 

Then  the  mean  value  of  X(k)  given  X(k-l)  is  OX(k-l).  Let  the 

/v 

covariance  matrix  of  X(k-l)  about  X(k-l)  given  all  data  up  to  Y(k-l) 


be  K 


X(k-l) 


(The  subscripts  on  the  covariance  matrices  denote  the 


■  The  covariance  matrix  of  OX(k-l)  about  OX(k-l)  is 

*'  Since  X(k-l)  is  independent 


<$KX(k  (0*  is  the  transpose  of  v 

of  U(k ) ,  we  have 


P[X(k  '  I  X(k- 1  ■  -  N*r^X(k-l 


U(k-1 


OK 


X(k-1 


^  1 


(3.4 
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Assume  that  K, 


is  not  singular.  Thus,  the  product  can  be  written  as 


W(k) 

P[Y(k)|x(k)]P{x(k)jx(k-l)]  =  (const)  exp^||[Y(k)-HX(k)]tK^k)[y(k)-HX(k)] 

+  [X(k)-®X(k-l)]tK‘jk)[X(k)-®X(k-l)]|J 

(3.5) 

where 


K 


X(k) 


=  s4> 

X(k-l)v 


*  +  rKu(k-i)rt 


(3.6) 


The  exponent  will  be  minimized  when  the  gradient  of  the  quadratic  in 
Eq.  (3.5)  is  zero. 

If  Q  is  any  symmetric  matrix,  T]  1b  a  vector,  and  if 

«  =  (HX  -  Tj)1  Q(HX  -  Tj) 


then  the  gradient  of  4  with  respect  to  X  is 

grad  $x  =  2H*  Q(HX  -  Tj )  (3.7) 

Also,  the  gradient  of  a  sum  of  quadratics  is  the  vector  sum  of  the 
gradients  of  each  quadratic.  If  the  mode  of  Eq.  (3.5)  is  found  by 
setting  the  gradient  with  respect  to  X(k)  of  the  exponent  equal  to 
zero,  and  solving  for  X(k),  the  optimum  estimate  of  X(k)  is: 

*(“>  -  [»l  Ki(k)"  *  Kxfk)]  [■*  )*00  *  Kx!ki1'x(k-li]  <3-»> 


The  covariance  matrix  i*  found  from  Eq.  (3.5)  by  taking  the 

inverse  of  the  sum  of  the  terms  that  are  premultiplied  by  X*  and 
postmul tipi led  by  X  or 


*X(k)  -  S(k)H  *  Ki(K)] 


Note  th.t  Kj(k) 


is  not  a  function  of  Y. 
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If  the  estimation  procedure  is  stated  at  k  =  1,  Kx^  mU8t  be 
known.  Assume  that  the  dynamical  system  is  turned  on  at  k  =  -Mq  and 
the  initial  conditions  are  zero  at  that  time. 


Then, 


X(l) 


M  +1 
o 


=  2 

i=0 


M  -i+1 

$>  °  ru(i-MQ-l) 


h(i) 


o 

2 


i=0 


M  -i+1 


TK. 


U(i-MQ-1) 


The  one  remaining  quantity  to  be  specified  is  X(l). 


(3.10) 


(3.11) 


P[Y(l ) |  X(l ) ]  P[x(l)]  =  (const)  expjj-||[Y(l)  -  HX(l)]fc  Kw(1)fY(1) 

+  tx(D  -  xd)]1  SrJ^txd)  -  X(1)]}] 


(3.12) 

HX(1)] 

(3.13) 


where  X(l)  is  the  a  priori  mean  of  X(l)  (X(l)  is  zero  if  the 
initial  conditions  in  Eq.  (3.10)  are  zero].  Setting  the  gradient  equal 
to  zero  gives 


X(l) 


[“'  s!i>"  *  Kx(i;]  *  ["* 


Vi)T<1) 


Sci>S(1 


>] 


(3.14) 


The  matrix  is  obtained  from  Eq.  (3.9). 
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P[Y(k)|X(k),Y(k-l),...,Y(l)]  «  N[HX(k),Kw(k)]  (3.16) 

so  that  Eqs.  (3.6),  (3.8),  (3.9),  (3,ll),  and  (3.12)  alao  specify  &(k) 
given  Y(k) . Y(l). 

In  the  aaoothlng  problem,  It  Is  desired  to  estimate  the  value  of 
X(l)  given  Y(k) , . . . ,Y(l) .  Following  a  similar  method,  the  conditional 
density  is  considered.  To  obtain  the  estimate  in  recursive  form,  it  is 
desired  to  express  the  gradient  of  in  P[x(l)| Y(l) , . . . ,Y(k) ]  with  respect 
to  X(l )  as  a  function  of  the  gradient  of  in  P[X(l)| Y(l) , . . . ,Y(k-l)] .* 


P(X(l)|Y(l),...,Y(k-l)] 


(3.17) 


* Throughout  the  balance  of  this  report  all  gradients  will  be  taken  with 
respect  to  X. 
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Since  on  the  right  side  of  Eq.  (3.17)  X(l)  is  contained  only  in  the 
numerator, 

grad  \in  P[x(l)|Y(l) . Y(k-l)]}oe  grad  {in  P[x(l),Y(l) . Y(k-l)]} 

(3.18) 

Similarly, 

grad  jin  P[x(l)|Y(l) . Y(k)]|  *  grad  {in  P[X(l),Y(l) . Y(k)]| 

=  grad  jin  P[X(l)|Y(l) . Y(k-l)]} 

+  grad  {in  P[Y(k)|  X(l)  ,Y(l) . Y(k-l)]} 

(3.19) 

So  the  gradient  conditioned  on  data  to  time  k  is  obtained  by  a  simple 
addition  of 


grad  {in  P[Y(k)|x(l),Y(l) . Y(k-l)]} 

to 

grad  {in  P[X(l)|Y(l) . Y(k-l)]}  . 

Setting  this  new  gradient  equal  to  zero  and  solving  for  X(l)  giv"s  the 
value  of  X(l)  that  maximizes  P[X(l)|  Y(l) , . . . ,Y(k)] . 

The  smoothing  equations  are  given  in  Table  1.  Because  of  the  large 
amount  of  algebraic  manipulation  required,  the  details  of  the  derivation 
are  reserved  for  Appendix  A.  The  flow  diagram  for  smoothing  is  shown 
in  Fig.  3. 

C.  OBSERVATIONS  THROUGH  A  SECOND  DYNAMICAL  SYSTEM 

Very  often  the  observations  will  be  made  through  a  second  dynamical 
system.  For  example,  if  state-variable  estimation  is  used  to  estimate 
orbital  parameters  of  a  space  vehicle,  the  output  of  the  radio  link  will 
be  fed  to  an  analog- to-digital  converter.  These  converters  have  a 
limited  dynamical  (amplitude)  range,  and  a  narrowband  filter  usually  must 
be  placed  after  the  broadband  i-f  amplifier  to  reduce  the  noise  variations. 
The  filter  bandwidth  may  be  small  enough  to  influence  the  statistics  of 
Y(k)  (i.e.,  the  noise  may  no  longer  be  "white”  and  the  covariance  matrix 
of  HX(k)  will  be  changed). 
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TABLE  1.  SUMMARY  OF  THE  ESTIMATION  EQUATIONS 


Filtering 

Eq. 

No. 

m  -  [»'«;<„)«  *  Ki(k)]‘1[Ht>';(k)V('‘)  *  ^k,»s(k-x)] 

(3.8) 

Kx(k)  -  [“,Ki(k)H  *  Ki<k)]_1 

(3.9) 

Kx(k)  *  wi(k-i)*t  *  rau (k-!)1-' 

(3.6) 

M  -i+1  /M  -i+l\t 

KX(1)“S  *°  rVi-M-l )r  f  ) 

i=0 

(3.11) 

*<»  ■  [HtS(i)H  *  Ki(i)]'l[HtKi<i)vCl)  *  Ki{x)*<1)] 

(3.14) 

Smoothing  x( 1 ) 

id)  *  gk.J-1 

•[t\-l,‘>t,tT(k)<klk-1>{T<k)  -  ««(k-1)|x(l,4*  vj 

(A. 8) 

°k-i  -  S  {[“‘Si.)"  *  S(i)]’1  S(1)‘} 

(A. 5) 

sk  *fc-/,,'S(k)<klk-,),wVi  *  6k-l] 

(A. 11) 

jk  *fc-i»vS(k><kik-i){r(k)  -  "«(k-i)|x(x)4*  Vi] 

(A. 12) 

%  -  S<1)  *  *  lV[k.(2)  *  -r,IU(2)r,Ht]"1  “ 

(A.9) 

S(k)(k|k-‘)  -  *  n‘»(k-l)r,]“'  *  S(k) 

(A. 6) 

With  No  Earlier  Date 

J2  ‘  ‘'“'[“.(X)  *  “ntU(2)rV]'1’r<a)  * 

(A.  10) 

With  Earlier  Data 

J2  ‘  *V[S(2)  *  ,ir,tU(2)rV]'1  ,(2)  *  *  *X(1)**(0) 

where  X(0)  is  the  estimate  of  X(0)  given  all  the  data  to  k  =  0. 
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o 


FIG.  3.  FLOW  DIAGRAM  FOR  SMOOTHING. 


In  thia  section,  methods  for  finding  the  optimum  estimates  of  X(k) 
of  a  function  of  the  second  dynamical  system  will  be  given.  The  two 
dynamical  systems  will  be  combined  and  a  single  set  of  system  equations 
will  be  written.  Noise  will  be  added  to  the  output  of  the  second  system, 
and  it  will  be  shown  that,  as  the  variance  of  this  noise  is  reduced  to 
zero,  the  variance  of  x(k)  approaches  the  minimum  mean-squared  error 
of  &(k)  in  a  known  manner.  Then,  some  small  output  noise  may  be 
assumed  and  an  estimator  may  be  designed  using  the  combined  system 
equations  and  the  estimation  equations  of  the  last  section.  This  esti¬ 
mator  may  have  a  mean-squared  error  as  close  to  the  minimum  as  desired. 

Consider  the  two  dynamical  systems  shown  in  Fig.  4.  The  matrix  A 
is  an  input  matrix,  TJ  is  a  transition  matrix,  and  J  is  an  output 
matrix  (or  vector).  Define  the  following  quantities: 


M(k) 


X(k)‘ 

X(k). 


(3.20) 
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e  = 


(3.21) 


7  * 

N(k)  = 


B'  = 


G 

1  AH 

J _ 

0 

!*. 

0 

I  r_ 

A 

!°. 

W(k) 

U(k) 


(3.22) 

(3.23) 

(3.24) 


Then,  the  system  equations  for  the  double  dynamical  system  are 

M(k)  »  GM(k-l)  +  7N(k-l)  (3.25) 

Y(k)  -  B'll(k)  +  f<(k)  (3.26) 


The  vector  ft(k)  is  independent  of  ft(j)  for  k  ^  j,  and  the 
elements  of  fl( k)  are  assumed  gaussian.  The  elements  °*  Kfi(k)' 
a  diagonal  matrix,  are  assumed  to  have  a  known,  very  small,  upper  bound. 
This  small  noise  generator,  of  course,  is  always  present  in  physical 

situations  but  its  parameters  are  usually  unknown.  If,  for  design  pur¬ 
poses,  we  cboose  a  arbitrarily -whose  elements  are  the  bounds, 

we  are  assured  by  the  following  argument  that  the  mean-squared  error  in 
estimating  M(k)  from  Y  is  not  greater  than  that  error  calculated 
assuming  the  noise  covariance  was  at  the  bound.  Thus,  a  bound  on  per¬ 
formance  can  be  found  from  the  bound  on  noise  power. 

The  block  diagram  for  the  error 


B(k)  *  [M(kj  -  fi(k)] 


(3.27) 


is  shown  in  Fig.  5.  Let  fi(k)  be  the  output  of  the  best  linear  estimator 


for  K 


'fi(k) 


equal  to  its  upper  bound.  The  error  covariance  matrix  is  a 


sum  of  a  linear  (matrix)  transformation  on  K 


n(k) 


and  a  linear 
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FIG.  4.  THE  MODEL  CONTAINING  TWO  DYNAMICAL  SYSTEMS. 
D  is  a  unit  delay. 


MULTIDIMENSIONAL 

POTENTIOMETER 


FIG.  5.  THE  DYNAMICAL  SYSTEM  FOR  FINDING  THE  ERROR,  E(k). 


transformation  on  ^ ^ .  The  mean-squared  error  is  the  trace  of  this 

transformation  which  is  the  sum  of  the  trace  of  the  transformation  on 

K  .  and  the  trace  of  the  transformation  on  K  Reduction  in 

fl(k)  N(k-l) 

the  elements  of  X,.  v  will  decrease  the  mean-squared  error  since  each 


diagonal  element  of  the  transformation  on  K 


is  greater  than  or 


equal  to  zero.  Then,  the  designer  knows  that  he  can  lower  the  upper 
bound  until  the  mean-squared  error  is  within  specifications. 

If  the  physical  situation  assures  him  that  the  bound  is  higher  than 


is  actually  the  case,  the  estimation  equations  of  Sec.  B  may  be  used 
directly  by  substitution  of  N(k),  7,  0,  B\  tfi(k),  and  M(k)  for 

U(k ' ,  r,  F,  H,  W(k),  and  X(k),  respectively. 
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D.  ESTIMATION  WI1H  PARTIAL  DATA 

In  this  section  is  given  the  form  of  the  optimum  linear  estimate 
when  some  of  the  data  are  missing.  Many  practical  situations  arise  in 
which  a  complete  set  of  observations  is  not  available.  An  example 
would  be  estimation  of  a  rocket  trajectory  where  telemetry  was  temporarily 
lost.  Another  example  would  be  when  telemetry  is  lost  during  a  midcourse 
maneuver.  Frequently  the  system  output  is  telemetered  on  a  time-shared 
basis,  so  that  the  data  are  available  only  periodically. 

Assume  that  all  the  data  are  available  up  to  time  k  and  are  lost 
at  time  (k+l).  Then  X(k)  may  be  calculated  using  Eq.  (3.8).  Note  that 

i-1 

X(k+i )  =  41  x(k)  +  ^  ru(j+r)  (3.28) 

r=0 

Taking  the  expected  values  conditioned  on  Y(l) , . . . ,Y(k)  [since  the 
U(j+r)  have  zero  mean  and  are  independent  of  Y(l) , . . . ,Y(k)]  gives 

X(k+i)  *  ®lX(k)  (3.29) 


The  covariance  of  £(k+i),  *$(*+!  )(k+il  *  conditioned  on  Y(l) , . . .  ,Y(k) , 

is 

i-1 

l!S(k.l)lk*1lk)  '  »‘ltx(k)‘*1>t  *  I  »rnWr)(»rr)'  (3-30) 

r*0 

Assume  that  the  data  are  regained  at  time  (k+j).  Then  the  best 
estimate  of  X(k-t-l)  in  the  interval  from  (k+l)  to  (k+J-l)  is  given 
by  Eq.  (3.29). 

Since  new  data  are  available  at  time  (k+j),  write 


P[X(k+j)|Y(k+j),Y(k),...,Y(l)] 


P[Y(K+j)lX(k+j)]Prx(k+J 


P[X(k+j)lY(k) . Y(1)1P[Y( 

P[Y(k+j)  ,Y(k) . Y(  1 ) ] 


JlL 


in m 


(3.31) 


and 


P[X(k+j)|Y(k),...,Y(l)]  »N[*JX(k);  KX(k+j)(k+j|k)]  (3.32) 
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Then  the  optimum  estimate  oi  X(k+j)  is 

S(k*J)  -  *  Kio^tMio] 

•[Ht,S.(ktj)Y<k*J)  *  KiWk*j|k)43*(k)] 

The  covariance  of  x(k+j)  conditioned  on  the  new  data  is  now 

Kx(ktJ)  *  ["‘sUd11  +  K£(k*j)(k+jH 


(3.33) 


(3.34) 


The  estimation  is  continued  using  Eqs.  (3.6),  (3.8),  and  (3.9)  until  a 
new  loss  in  data  occurs.  Then  Eqs.  (3.30),  (3.33),  and  (3.34)  are  used 
again . 

If  smoothing  is  to  be  performed,  Eq.  (A. 13)  will,  of  course,  not 
contain  the  missing  data;  i.e., 


P[X(k+j ) I  X(l)  ,Y(2) . Y(k)]  -  n[MJX(k);HK^k+j)(k+j|k)Ht  + 


(3.35) 


The  new  gradient  added  by  Y(k±j)  is 

-ac‘(»J),H*[llK;i(liiJ)(k.j|k)»'  ♦  S(k.j)]  >  1 


Then 


id)  .  {c‘dJ)tHt[«s!(ktj)(kd|k)Ht  .  N,(k+J )]  t§k}  1 


."I 


•{c‘(«J)t«[mi(MJ)(k.j|k)*t  ♦  S(k,j)]  'r<k)  -  *  Jk| 

(3.36) 

When  the  next  observation,  Y(k+J+1)  arrives,  Eq.  (A. 8)  is  used.  It 
should  be  noted  that 


C.  4  =  $JC. 
k+j  k 


(3.37) 


and  that 


ck.j.i  ■  <["V  *  kx(k*j4i)]  (3-“> 
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E.  AM  EXAMPLE  OF  A  SMOOTHING  ESTIMATION 

A  simple  (but  typical)  dynamical  system  will  be  assumed.  The 
matrices  will  all  be  scalar. 

*  -  \  .  h  =  i ,  r  =  i 

The  covariances  will  be 

KU(k)  =  1(  Kw(k)  =  4 

It  is  desired  to  estimate  X(l)  based  on  the  observations  Y(l),  y(2), 

and  Y(3). 

The  random  numbers  U(k)  and  w(k)  were  obtained  from  a  gaussian 
random  number  table  with  the  variance  adjusted  to  1  and  l/4  respectively. 
The  numbers  chosen  were: 

k 
1 
2 
3 

From  Eq.  (2.3a) , 

X(k)  =  \  X(k-l)  +  U(k-l) 

Y(k)  =  X(k)  +  W(k) 

Then  X(k)  and  Y(k)  are: 

k  X(k )  Y(k) 

1  1  1.41 

2  1.27  1.61 

3  0.31  0.25 


uQO  w(k) 

0.77  0.41 

-0.33  -0.11 

-0.06 
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From  Eq,  (3.6)  and  Eq.  (3.9), 


KX(k)  "  [4  +  Kx(k)] 

Kx(k)  =  1  Kx(k-1)  +  1 

The  variances  used  in  calculating  X(k)  given  Y(l) . Y(k)  and  X(l) 

equal  to  zero  are  then: 

k 

1 
2 

3 

4 

From  (3.8), 

s(k) .  K5(k)[«(k) .  i 

and 


KX(k)  KX(k) 

0 

1  0.2 

1.05  0.202 

1.05  0.202  (steady  state) 


Jl2)Uu)=o  =  0'938 

a<3,|xU)-o  -  0  297 


From  Eq.  (A. 5) , 


'k-1 


k_1  -1 

r.  {[«  *  J  s'.) 

1=2 


C2  =  0.106 


From  Eq.  (A. 9) , 


+  4  + 


1 
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The  matrix  in  (A.IO)  is  the  covariance  matrix  of  X(l) 

given  that  U(k)  has  been  applied  long  enough  for  the  dynamical  system 
to  reach  steady  state  at  k  =  1.  From  Eq.  (2.3a), 


X( 1 )  =  ^  ®iU(-i) 
i=0 

Then  the  steady-state  variance  of  X(l)  is 


i=0 


and 

§2  =  5.53 


From  Eq.  (A. 10) 

J2  Ml[l  *  l]  Y(2)  +  4  X  Y(l) 

a  6.12 

where  X(l)  *  0.  Then 


X[1|Y(1),Y(2)]  =  |y||  =  1.11 


From  Eq.  (A. 6) 


Xy(3)  ( 3 | 2 )  =  1.30 


From  Eq.  (A.ll) , 


=  3.532 

From  Eq.  (A. 12),  =  6.11.  Then 

X[llY(l),Y(2),Y(3)]  s  1.10 

Equation  (3.14)  may  be  used  to  find  X[l}Y(l )]  *  1.18. 
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The  results  are  summarized  in  Fig.  6.  The  estimator  reaches  steady 
state  after  only  two  observations.  Further  examination  will  show  that 
the  weights  of  all  Y(k)  after  Y(2)  are  very  nearly  zero.  Such  very 
short  estimator  impulse  responses  occur  in  a  large  number  of  practical 
problems  with  typical  dynamical  systems.  As  will  be  shown  in  Sec.  D  of 
Chapter  VI,  the  estimator  may  always  be  truncated  after  a  small  number 
of  terms  except  in  the  rare  case  of  a  dynamical  system  with  an  extremely 
high  Q.  Usually,  short  impulse  estimators  for  even  these  cases  may  be 
found  by  choosing  an  equivalent  model  of  the  dynamical  system  with  a 
much  lower  sampling  rate. 


X(l) 


FIG.  6.  THE  ESTIMATES  OF  X(l). 
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IV.  APPLICATIONS  TO  DETECTION  OP  GAUSSIAN  SIGNALS 
IN  ADDITIVE  GAUSSIAN  NOISE 


In  this  chapter  the  theory  of  linear  state-variable  estimation  is 
applied  to  the  problem  of  detecting  a  gaussian  signal  immersed  in  additive 
gaussian  noise.  The  optimum  detector  contains  an  operator  that  gives 
the  best  estimate  of  each  signal  sample  value  during  the  detection 
interval  based  on  all  the  data  observed  during  the  interval.  Typically, 
several  thousand  data  points  may  be  observed.  In  the  usual  derivation 
of  the  optimum  detector  (see  Sec.  A),  a  matrix  of  an  order  equal  to  the 
number  of  data  points  must  be  inverted.  When  the  signal  can  be  repre¬ 
sented  as,  or  approximated  by,  the  output  of  a  noisy  dynamical  system, 
the  estimation  equations  of  Chapter  III  may  be  applied  directly.  The 
matrices  to  be  inverted  will  be  no  larger  than  the  order  of  the  signal- 
generating  dynamical  system,  regardless  of  the  number  of  data  points. 

Further  simplif ication  results  if  tho  impulse  response  of  the 
estimator  of  a  sample  value  is  truncated  when  the  error  covariance  matrix 
shows  that  there  will  be  little  reduction  in  mean- squared  error  by 
conditioning  on  additional  data  points.  A  near-optimum  time- invariant 
detector  is  then  shown  to  exist. 

The  chapter  begins  with  a  definition  of  the  likelihood  ratio.  Then 
follows  a  derivation  of  the  optimum  detector  that  requires  an  inversion 
of  a  matrix  of  high  order.  The  final  section  derives  the  near-optimum 
time- invariant  detector. 

Let  the  observations  Y(k)  be 

Y(k)  -  x1(k)  +  W(k)  (4.1) 

when  the  signal  x^(k)  is  present  (hypothesis  is  true),  where  w(k) 

is  additive  gaussian  noise.  When  no  signal  is  present  (hypothesis  w2 
is  true), 

Y(k)  -  W(k)  (4.2) 

The  quantities  Y(k),  x^(k),  and  W(k)  are  assumed  to  be  scalar. 
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When  'a  statistical  decision  is  made  between  the  presence  of  a  signal 
in  noise,  or  noise  alone,  the  best  decision  is  based  on  the  likelihood- 
ratio  test  [Ref.  10,  p.  318],  This  ratio  is  defined  as 

P[Y(l),...,Y(n)|w  ] 

«T<1> . V(°»  ■  F[T(1) . Y(n)|Ma]  (4-3) 

If 

* 

L[Y(1) . Y(n)]>p  (4.4a) 


we  say  a  signal  is  present,  and  if 

L[Y(1) . Y(n)]<p  (4.4b) 

we  say  there  is  noise  alone. 

Assume  the  signal  is  gauss i an  with  "zero  mean"  covariance  matrix 


*K<\ 


where 


XjU) 

*x(2) 


*1(n) 


Assume  that  the  noise  has  zero  mean  and  denote  the  covariance  of  the 

signal-plus- noise  vector  by  Ky  .  Then 

n 


Ky  +kx 

n  n  n 


(4.5) 


and 


P[Y(l) , . . . ,Y(n)| Wj]  -  (const)  exp{-  \  Y*  K^1  yI  (4.6) 

'  n 

P[Y(l),...,Y(n)|w2]  «  (const)  exp|-  Y*  K^1  YnJ  (4.7) 
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Since  the  logarithm  ie  a  single-valued  function,  one  might  Just  as  well 
consider 


In  L[Y(1 Y(n)]  =  (const)  -|y*  k"1  Yn  -  Y*  K^1  yJ  (4.8) 

n  n 

Or,  a  signal  is  said  to  be  present  if 

-  <  \  *  <  S1  >  ’'o  <4-9> 

n  n 

It  is  noticed  that  the  dimension  of  Yr  equals  the  number  of  data  points 

used  in  the  decision.  The  inversion  of  Ky  may  be  difficult  or  impossible 

n 

in  problems  involving  a  great  deal  of  data. 


A.  THE  LIKELIHOOD  DETECTOR* 


This  section  describes  how  to  calculate  the  left  side  of  inequality 
(4.9).  It  will  be  shown  shortly  that  this  calculation  requires  finding 
optimum  estimates  of  the  signal  conditioned  on  the  observed  data.  First, 
the  optimum  smoother  will  be  derived  in  a  form  different  from  that 
derived  in  Chapter  III. 

A  linear  estimate  of  x.  (i)  given  Y  will  have  the  form 

i  n 

n 

*i(1)  *  *ij  Y(J)  (4*10) 

j-i 

The  error  is 

n 

•(i )  -  xx(i)  -  £  Y(J)  (4.11) 

J-l 


and  the  mean-squared  error  is 
_  n 


n  n 


,J(1)  -  »„«>)  -  2  2  *  2  X  «-ia» 

J-l  k-1  J-l 


*The  following  treatment  is  due  to  T.  Kailath  [Ref.  8]. 
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where 


Rx(i-J)  -  E[x1(i)x1(j)]  (4.13) 

and 

Rx+w(j-k)  -  E  |[Xl(j)  +  W(j)][Xl(k)  +  W(k)]} 

2 

To  minimize  e  (i),  take  the  partial  derivative 
2  n 

.  o  .  2ax(t-j)  .  i  -  . . . 

k«l 

Thus  there  are  n  simultaneous  equations  giving  a  solution  for  the 
n  a  .  If  this  process  is  repeated  for  each  i,  the  result  may  be 
written  as  a  matrix  equation 


(4.14) 


(4.15) 


t**J]  \ '  \ 

which  has  the  solution 

A  -  iC1  (4.16) 

n  n 

The  dot  product  of  the  ith  row  of  A  with  Yq  will  give  the  minimum 

mean- squared- error  linear  estimate  of  x, (i),  given  Y  ,  or 

i  n 


AY_ 


\{l) 


Lai(n) 


Notice  that 


*  ■  (s  -  W 

\  n  n  /  n 


»  I 


n  n 


(4.17) 


(4.18) 
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Then 


S1  -  S1  A 

n  n 


(4.19) 


Substituting  Eq.  (4.19)  into  Eq.  (4.9),  we  can  now  say  that  a  signal  is 
present  if 


(4.20a) 


or 


Y*  K”1  X  >  7  (4.20b) 

n  w  n  o 
n 

As  the  length  of  the  sample  of  signal  and  noise  or  the  sire  of  n 
grows,  so  does  the  dimension.  In  practice,  it  is  impossible  to  invert 
matrices  of  dimension  greater  than  four  or  five  hundred.  In  a  typical 
planetary  radar  detection  problem,  the  signal  sample  may  be  30  min  long 

with  a  bandwidth  of  5  cps.  Then,  the  dimension  of  Kv  will  be 

4  n 

1.8  x  10  .  If  such  problems  are  to  be  solved  optimally,  a  more  efficient 

design  procedure  must  be  found. 

B.  A  NEAR-OPTIMUM  DETECTOR  CONTAINING  A  TIME- INVARIANT  FILTER 

The  remainder  of  this  chapter  will  show  that,  for  Y(k)  stationary 

and  n  sufficiently  large,  there  exists  another  matrix  representing  a 

time- invar! ant  filter  whose  mean-squared  error  averaged  over  i  is 

arbitrarily  close  to  the  mean-squared  error  of  A  averaged  over  i. 

Furthermore,  the  norm  of  the  difference  between  the  1th  row  of  this 

th 

new  matrix  and  the  i  row  of  A  tends  toward  zero  as  i  Increases. 

As  shown  in  Appendix  B,  this  implies  that,  as  m  increases,  the  detec¬ 
tion  error  probabilities  using  the  time- invariant  filter  are  arbitrarily 
close  to  those  of  the  detector  employing  the  filter  represented  by  A. 

The  linear  smoothing  equations  of  Chapter  III  may  be  used  to  find 
the  time- invariant  filter.  It  will  be  shown  that  only  a  small  sequence 
of  the  elements  of  Yr  are  required  at  one  time,  so  that  computer 
storage  requirements  may  be  considerably  reduced. 
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Assume  that  a  filter  with  impulse  response  is  represented  by  the 
.vector  A^.  Let  this  filter  give  the  optimum  estimate  of  ( i ) ;  i.e., 

(4.21) 


^(D  =  [X(i)]t  Yn 


(i) 


Let  another  filter  with  inqiulse  response  vector  B  give  another 
estimate  of  x^(i).  We  now  prove  that  a  bound  on  the  vector  difference 

between  A^  and  may  be  computed  from  the  difference  in  mean- 

-*•(  i  )  -►(  i ) 

squared  errors  of  A  and  B  . 

--»(  i  )  ,  t 

Theorem:  If  [A  |  Y  is  the  minimum  mean-squared  error  estimate 

^  >(  i ) 

of  x,  (i),  given  Y  .  and  [B'  '  ]  Y  is  some  other  linear  estimate 
1  n  n  — g 

of  X|(i)  with  an  increase  in  mean- squared  error  of  Ae  ove.  that 

of  [A^ 1 ^ ] t  y  ,  then 
n  * 


Ae5  1  [AA^]*  E{wnW^j  AA 


(i) 


-4<o  t«<,)),a»(1) 

where  *  ^(i)1  104  AA^  is  defined 

Aa^1)  ,  b^1)  .  J^1) 


as 


(4.22) 


.a,.  R 
ik  x+w 


<j-k>  - 2  2  R,  (j-*> 


Proof:  Let  Aa^  be  the  Jth  element  of  AA^. 

_____  n  n 

-  2  2  (*u  *  v»  <->-k) 

j*k  k»l 
n  n 

-  2  2  v 

J«1  k*l  J»1 

n  n  n  n 

■  2  2  ^i/Ak «...  (j-k>  * 2  2  2  "...  (j-k) 

J*1  k*l  J*1  k=l 

n 

‘  2  S  ^ij  Rx  (j‘i}  (4,23) 

J*1 
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By  Eq.  (4.15), 


Ae2(i) 


n  n 


>  >  Aa,  .Aa,,  R 

A  A  ij  ik  x+w 

j=l  k=l 


(j-k) 


=  [AA^V  Ky  AA^1  ^ 
n 


=  [AA^V 


k  +KW 
L  n  nj 


AA 


(i) 


i?  [AA^fl^  AA^ 
n 

■  4u)  “(1> 


for  white  noise  since  Kx  is  positive  definite. 

_  n 

2 

Next,  a  limit  for  e  (i)  given  Y(i-m) , . . . ,Y(i+m)  as  m  ■*  »  will 

be  derived.  For  m  <  <»,  the  mean-squared  error  will  be  larger  than 

this  limit  by  some  arbitrarily  small  amount  for  arbitrarily  large  a. 

The  length  of  the  impulse  response  of  the  filter  represented  by  the  1th 

row  of  A  is  n.  If  the  optimum  filter  with  an  impulse  response  2m 

2 

long  has  a  mean-squared  error  Ae  greater  than  the  limit,  then  the 

th 

theorem  above  assures  that  the  norm  of  the  difference  between  the  i 
row  of  A  and  this  second  filter  is  bounded  by 


provided  n  >  2m.  This  bound  will  be  used  in  Appendix  B  to  show  that 
the  difference  in  error  probabilities  between  the  detector  of  Sec.  A 
(whose  estimator  has  an  inpulse  response  of  length  n)  and  a  detector 
containing  an  estimator  with  an  impulse  response  2m  <  n  long,  is  also 
bounded . 

Most  of  the  proof  of  the  theorem  is  used  to  show  that  the  mean-squared 
error  of  the  estimate  of  x^(i)  conditioned  on  Y(i-m),. .. ,Y(i+m) 

(found  by  the  methods  of  Chapter  111  or  Sec.  A)  is  identically  equal  to 
the  steady-state  mean-squared  error  of  the  sanpled-data  smoothing  filter 
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(estimating  back  for  a  fixed  increment  m)  with  an  impulse  response 
2m  long  derived  using  the  more  classical  spectral-analysis  approach. 
This  fact  fits  one's  intuition,  but  it  is  not  a  trivial  problem. 


Theorem:  As  m  -*  <x>, 
i+m 


_  i+m  r 

'2<‘>  -  »x<°)  -  2  ~R,(0)  -  I 

j=i-m  -a 


r 

sx(f) 

J  sx+w^ 

-00 

SX+W^ 

df 


(4.24) 


where  S  (f)  and  S  (f)  are  the  signal  and  noise  power  spectra, 

x+w 

respectively. 

Proof :  Consider  the  equation  for  the  steady-state  mean-squared 
error  of  a  sampled-data  filter  with  an  impulse  response  2m  long. 


e2  «  R  (0)  + 

xv  ‘  2nj 


<f  jwz(  2  */J  2  \z*l 

'  '  Ljm-m  k=-m  J 

o 

-  vz>  2  *j(z'J  *  zJ>} 


dZ 

Z 


(4.25) 


where  &X(Z)  and  sx+v(z)  mee  the  sampled  signal  and  signal-plus- 

noise  apectra,  and  rQ  is  taken  to  be  around  the  unit  circle.  Since 

Y(t)  is  bandlimited  and  if  the  a  are  picked  to  minimize  the  mean- 

5  ^ 

squared  error,  e  will  approach  the  mean-squared  error  of  the 
optimum  continuous  noncauaal  filter  as  m  -*  ».  This  latter  mean- 
squared  error  is  well  known  [Ref.  11]  and  is 

2 

df 


r 

sx(£) 

J  SX+W^ 

•00 

R  (0) 
x  ' 


Equation  (4.25)  may  be  rewritten  as 


•'  ■  «,<»>  *  13 


1  X  ‘-J-0  k=0  J 


2m 


-  S, 


t<z>  2  *i(; 


(4.26) 


J-0 
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Write 


2m  2m 


A  «UI  AUI  y*  fiUt  «UI 

<fWz>  1  *jz'J  2  \z*  -5^^w<z>  2  I  *AzJ'k  r 

^  <-n  v-n  ^  J«0  k=0 


2n  2n 


*1  *j2  •kv.»j-k)T> 

J»0  k=0  (4.27) 


Using  Parseval's  theorem  for  discrete  systems  gives 


rr2m  *  r2m 

ihyl  *3(z'J*"-zJ'm>  vz)f -9zi  *j 

r  U-0  i  U=0 


z-J+n  Sx(z)  f 


2  ]>  Sj  RX[(J-»)T]  (4.28) 


In  Appendix  E,  it  is  shown  that  e  is  minimum  if 


2  %  -»,KJT)] 


so  that 


e2  -  Rx(0)  -  £  aj  Rx[(j-n)T] 


(4.29) 


Putting  Eq.  (4.15)  into  Eq.  (4.12)  gives 


•*(*>  * -  2  *u 

J-o 

[in  which  e2(i)  is  as  defined  in  Eq.  (4.12)],  or 


2  2 . 

e  r*  e  (i 


(4.30) 


(4.31) 


and  the  theorem  is  proved. 


SEL- 64-131 


If  we  are  able  to  derive  a  filter  giving  an  estimate  of  x^(i)  using 
as  data  Y(i-j ) , . . . ,Y(i+J ) , (i-m)  >  0,  (i+m)  <  n,  with  mean-squared 

error  e^i-m.i+m) ,  then  we  know  that  the  mean-squared  error  e*(l,n), 
of  x  (i),  given  Y(l) . Y(n),  is  bounded  by 

2 


e^(i-m,i+m)  g  e^(l,n)  §  Rx(o) 


c 

sx(f) 

J  sx+w^ 

-00 

sx+w(f) 

df 


(4.32) 


And  if 


2  ^  2<<  \  \ 
e  =  e^ti-m.i+m) 


R  (0)  + 

x 


r 

sx(f) 

j  SX+W^f) 

-00 

SX+W^ 

df 


is  very  small,  then  ^ (i-m, i+m)  will  be  very  close  to  A^(l,n), 
where  A^(j,X)  is  the  impulse  response  of  the  estimator  of  3^(1) 
using  data  from  time  equals  j  to  time  equals  i . 

The  signal  may  be  a  gaussian  output  of  a  dynamical  system  witn  random 
inputs,  or  the  signal  statistics  may  always  be  approximated  to  any  degree 
by  the  statistics  of  such  a  system.  Our  signal  generator  model  will  be 
a  noisy  linear  discrete  system. 

The  optimum  estimate  of  a  linear  operation  on  a  state  vector  is  that 
same  linear  operation  on  the  optimum  estimate  of  the  state  vector.  The 
Bayes  estimate  of  X(i)  is  the  conditional  mean  of  X(k),  X(i);  and 

the  conditional  mean  of  x^(i)  =  HX(i)  is 


Xj(i)  -  HX(i ) 


Equation  (A. 8)  (smoothing  with  earlier  data)  may  be  used  to  find 
&(i)  and  ( i ) ,  given  Y(i-m) , . . . ,Y(i+m) .  The  error  covariance  matrix, 
KX(i  )(i_m»l+m)  m»y  *>e  easily  calculated.  Then 

e*(i-m,i+m)  s  HKj^^i-m.i+i)  H*  (4.33) 

The  value  of  m  is  increased  until  the  upper  bound  of  Eq.  (4.32) 
approaches  sufficiently  close  to  the  limit  of  the  second  theorem.  This 
will  fix  the  length  of  the  impulse  response. 
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Equation  (A. 8)  may  be  written  in  the  form 


m 


X(i)  =  ^  EjY(i+j) 
J  S-Ifl 


(4.34) 


The  scalar  weight  of  Y(i+j)  in  the  estimate  of  £^(i)  is 


a  *  HE 

ij  J 

Notice  that  E  is  a  function  of  m  only  and  not  of  i.  Thus,  the 

J 

near-optimum  filter  is  time  Invariant. 

The  calculation  of  E.  is  greatly  sinqplified  by  noting  that,  in  Eq. 

/  \  i )  /  \ 

(4.16),  the  elements  of  A'  (i-m,i+m)  are  symmetrical  about  i. 

Then ,  only  the  first  (m+l)  Ej  need  be  calculated,  and  it  is  much  easier 
to  find  these  than  to  calculate  the  last  m  E^  directly.  Examination 
of  Eqs.  (A. 4),  (A. 8),  (A.ll),  and  (A. 12)  shows  that 


*i-Ci*V 

*1-1  -  Ci  *i<.)  “id)  *v 

*1-J  -Cl  Sd)  “id)  *id)  *id-D  "V 


*1-1  -  Cl  *id)  “id)  *id)  *id-i) 

*id-l*D  “V 


(4.35) 


Examination  of  Iqs.  (A. 4)  and  (A. 6)  shows  that 


Sd)d— .“■) 


*VSd)“  *  C) 


♦  I  CC*VC)('l->-1>"*Vi,>i 

j-2 


-1  (4.36) 


39  - 


SEL-64- 131 


where 


Qj-i  "  Kx(j-i)  Kx(j-i)  ®  ^4,3? 

The  procedure  then  is  to  calculate  e3(i-m,i+m)  for  successively 

— Z  1 

larger  values  of  m  until  Ae  is  small.  The  weighting  coefficients 
are  calculated  according  to  Eq.  (4.35),  and  the  detector  is  connected 
as  shown  in  Fig.  7.  Notice  that  the  first  and  last  m  x^(i)  are 
neglected.  These,  of  course,  could  be  calculated;  however,  for  n  »  m, 
little  is  to  oe  gained  by  the  additional  information. 


FIG.  7.  THE  NEAR-OPTIMUM  DETECTOR. 
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V.  LINEAR  FILTERING  OF  SIGNALS  WITH 
CONTINUOUS  UNKNOWN  PARAMETERS 


Magill  [Ref.  5]  has  given  the  form  of  the  general  solution  to  the 
estimation  problem  with  unknown  parameters,  and  has  developed  practical 
methods  of  implementation  for  gaussian  inputs  and  a  finite  parameter 
set.  No  general  implementation  has  been  developed  when  the  parameters 
may  be  chosen  from  an  infinite  set  or  for  the  optimum  linear  filter  with 
unknown  parameters  and  nongaussian  inputs.  This  chapter  modifies  Magill 's 
..solution  to  allow  practical  estimation  of  the  state  variable  or  signals 
with  a  dense  set  and  stationary  or  near-stationary  processes.  The  dis¬ 
tribution  over  the  parameter  set  is  not  restricted  to  gaussian  in  Magill 's 
solution  or  in  this  chapter. 


A.  MAGILL' S  SOLUTION  FOR  A  FINITE  NUMBER  OF  PARAMETER  VALUES 


If  the  state  of  nature  u  ia  to  be  estimated,  the  Ba 
for  a  mean- squared  loss  function  is 


itinate 


u  •  B[w|D']  *  f  u  P(w|D' )  du 


(5.1) 


u 

where  D*  is  the  data.  If  a  is  a  parameter  or  parameter  vector 
belonging  to  the  paraneter  set  Aq, 


and 


p(w|  D' )  «  J  p(u|d' ,a)  P(a|D')  da 

Ao 


On  defining 


fi  A. 


S(a) 


J  w  P(w|  D’  ,a)  dw 


(5.2) 


(5.3) 


(5.4) 


f. 
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and  interchanging  the  order  of  integration,  Eq.  (5.1)  is  found  to  be 


A  I  A 

u=r 


(a)  p(a|D')  da 


(5.5) 


In  other  words,  the  best  estimate  of  u  is  the  estimate  that  assumes 
a  is  true-weighted  by  the  probability  of  a  conditioned  on  the  data 
and  integrated  over  Aq. 

Magill  constructs  his  adaptive  filter  by  building  a  number  of 
estimators — one  for  each  member  of  the  parameter  set.  Then 

P[Y(l) . Y(k)|a  ]  is  evaluated  for  each  a^.  The  outputs  of  the 

10(0^  are  weighted  by 

P[Y(i) . Y(k)|a.  ]  P(a  ) 

P(ot|D*)  -HajTd) . TOO] - pTY7i7T7. ;?;*)] 

(5.6) 

A 

and  summed  giving  w.  Practical  methods  for  evaluating 
P[Y(l) , . . . ,Y(k) |ai]  in  the  state-variable  problem  have  not  been  worked 
out  for  the  dynamical  system  whose  inputs  are  nongaussian.  The  best 
linear  w(a)  can  still  be  chosen,  however. 


B.  FILTERING  OF  STATIONARY  OR  NEAR- STATIONARY  PROCESSES  WITH 
PARAMETERS  FROM  AN  INFINITE  SET 

The  estimator  of  a  state  variable  or  a  signal  with  a  given  a  [the 
value  of  the  signal  or  state  variable  at  time  k  will  be  denoted  by 
S'*  ,a)]  is  a  linear  combination  of  the  observed  Y(i);  i.e., 


x^k.tt)  «  ^(a)  Y(j) 

J»1 


From  Eq ,  (5.5) 

* 


i(k)  *  f  2  \j{a 

{ 


)  P[a!Y(l) . Y(r) ]  Y(j)  da 


S  v(j)  /akJ(  a)  p[a|  Y(l) . Y(r)]  da 

Jal  A 


(5.7) 


(5.8) 
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In  other  words,  the  optimum  weight  is  just  the  mean  weight  conditioned 
on  the  observed  data. 

In  Appendix  C  a  filter  is  derived  that  has  the  form 

k+k 

o 

Vk,a)  =  2  akj(a)  Y(J)  (5,9) 

j-k-ko 


The  theory  of  the  last  chapter  shows  that  the  estimate  given  by  this 
filter  is  arbitrarily  close  to  the  minimum  mean- squared- error  estimate 
for  stationary  processes.  The  weight  a^Ca)  is  given  by 


-  s 


si(a) 


(C. 33) 


where  and  are  the  signal  and  noise  power  in  the  frequency 

range  [(B/n)(i-l)]  to  [(B/n)i]  cps.  The  signal  is  assumed  to  be 
in  the  range  from  zero  to  B  cps.  The  constant  is  found  from 


sin  [^i(jT  -  kQT)] 
«(JT  -  kQT) 


(C.32) 


(C.31) 


The  noise-power  spectrum  is  assumed  to  be  known  so  that  there  is 
correspondence  between  the  elements  of  the  parameter  set  and  the  possible 
power  spectra  of  x^(k,a).  Then  the  optimum  weight  for  estimating  x^(k) 
will  be 


n  C  s 

v,  -  S  v  /  s-br  Pts‘!,'(1, . ,(r)]  dS‘  (5’10) 

i=l  {  1  1 


where  the  conditional  density  of  is  the  sum  of  all  the  densities  of 

[Q:S1(Q)  =  S. ]. 
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Write  G^  in  transfer  matrix  form  with  ( J )  as  the  output;  i.e., 


^(1)' 

Y(l) 

qi(2) 

Y(2) 

=  G. 

i 

1 

Qi(r) 

Y(r) 

(5.11) 


The  matrix  is  causal  and  therefore  triangular  below  the  main 

diagonal.  Such  matrices  are  nonsingular  if  there  are  no  zeros  on  the 
main  diagonal.  Since  Gi  is  time  invariant,  this  is  always  true,  and 

there  will  be  a  one-to-one  correspondence  between  [Y(l) . Y(r)]  and 

[q1(l),...,q1(r)].  Thus, 


PtSjYd) . Y(r)] 

P[S1|q1(l) . qt(r)] 


P[s1|qi(i),...»qi(r)] 

P[qt(l) . q1(r)|S1]  PfSj 

- p[q'1(i),:.":,q1Cr)T“ 


(5.12) 

(5.13) 


th 


There  are  n  G^ ,  each  with  bandwidth  of  B/n  so  that  only  every  n 
q^(j)  is  needed  to  find  the  density  of  [q^l) . q^r)],  or 

Plq.Uhq.dbq/an) . q.d)]  p[s.] 

,tSll’l<1) . ,ilr)1  -  F[q1(l).,l(.).ql(j«i) . ,4(r)l -  (5-,4) 


The  filter  Gt  is  sharp-cutoff  so  that  the  qi(j)  in  Eq.  (5.14) 
are  essentially  independent  and  the  following  is  true: 


P[Sjqi(l),. 


1 

r/n 

1  V  „2/ 

p[s4] 

[2n(S1+«1)]r/2n  ^ 

2(S  +N  )  Z  qi' 

_  J»1 _ . 

( 

r/n 

p[si]dsi 

s. 

2(S  +N  )  Z  V  ^ 
>1 

(5.15) 
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The  integral  in  Eq.  (5.15)  may  be  evaluated  directly  as  a  function 

2/  x 

of  L  c.;_  (nj )  and  r,  and  stored  ahead  of  time. 

Since 


Si  +  Ni 


=  1  - 


Si  +  Ni 


only  the  following  need  be  calculated: 

N. 


/  S1  *  *»  PtSll<ll<1) . ,1<r)1  *,,,(!) . ,l(r)1 

si 

/ 


!8t  + 


r/n 

"  2(Si  +  Mt)  ^  qi(nj) 


P[Si]  dSt  (5.16) 


2 

Equation  (5.16)  is  a  function  only  of  [r.Lq^nj)]  and  may  also  be 
calculated  ahead  of  time  and  stored. 

A  less  complex  procedure  usually  giving  almost  as  good  results  is 
maximum  likelihood  estimation  of  (Si  +  J^).  This  estimate  is  found 


by  solving  for  in 


d  |P[S  |q  (1) . «.(*)]} 

- 5T - 5 - - 


(5.17) 


Then 

r/n 

-  *<•,'.  v  *  2(8i  7  —f  1 =  0 

and 

r/n 

(Si  +  V  "  f  £  qi(nj)  (5,l8) 

J-l 

This  estimate  may  be  instrumented  by  an  integrate-and-dvuq>  circuit  or 
closely  approximated  by  a  square-law  device  followed  by  a  lowpass  filter 
with  a  bandwidth  of  l/r. 
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The  variance  of  the  estimate  of  (S^  +  1^)  is  [Ref.  10,  p.  261] 

"e.t  -  S(?)2  <S1  +  <5-: 

Take,  for  instance,  a  100-tap  line  with  20  narrowband  filters.  This 
adaptive  filter  can  begin  its  estimating  when  r  equals  100  and,  for 
all  practical  purposes,  it  will  be  converged  to  Weiner  optimum  when  r 
equals  200.  If  the  system  bandwidth  is  100  cps,  this  convergence  will 
be  obtained  at  the  end  of  1  sec  of  operation. 
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VI .  ESTIMATION  WITH  NONGAUSS IAN  INPUTS 


A.  INTRODUCTION 

In  this  chapter,  the  theory  of  Chapter  III  is  extended  to  include 
state-variable  estimation  when  either  the  inputs  to  the  dynamical  system 
or  the  output  noise  or  both  are  nongaussian.  The  same  general  approach 
used  in  Chapter  III  (finding  the  mode  of  the  conditional  density)  is 
used.  Since  this  density  is  no  longer  gaussian,  the  mode  is  not  neces¬ 
sarily  located  at  the  mean.  A  uni modal  density  or  one  with  a  unique 
maximum  will  be  assumed.  The  state  variable  giving  this  maximum,  of 
course,  is  the  maximum  likelihood  estimate.  When  Bayes  estimates  are 
made,  it  will  further  be  assumed  that  the  density  is  symnetrlc  about 
some  point. 

Most  of  the  discussion  in  this  chapter  is  concerned  with  the 
propagation  of  nongaussian  statistics  through  a  discrete  linear  dynamical 
system.  At  first  glance  (with  the  central  limit. theorem  in  mind),  one 
might  conclude  that  the  output  density  of  a  discrete  dynamical  system 
with  feedback  would  converge  to  a  gaussian  density  since  the  output  is 
the  sum  of  a  large  number  of  Independent  random  variables.  If  this 
were  the  ease,  then  linear  estimation  would  be  optimum  for  one  or  more  of 
the  atate  variables.  Unfortunately  (as  shown  in  the  next  section),  this 
is  never  true  for  a  tine- invariant  discrete  system.  In  Sec.  B,  necessary 
and  sufficient  conditions  for  the  output  density  to  be  of  the  "same  form” 
as  the  input  density  (i.e.,  the  output  density  is  the  input  density 
translated  and/or  with  a  change  in  scale)  are  also  derived. 

Section  C  describes  the  calculation  of  the  joint  density  of  the  state 
variables  and  the  observations.  Section  D  contains  a  discussion  of  methods 
for  finding  the  estimate  and  it  is  shown  that  near  optimum  estimation  may 
be  often  obtained  with  only  a  short  sequence  of  observations.  The  last 
section  describes  the  estimator's  asymptotic  behavior  as  the  signal- to- 
noise  ratio  is  increased. 

B.  PROPAGATION  OF  FIRST-ORDER  STATISTICS 

The  output  of  the  linear  system  will  be  assumed  scalar,  or  the 
first-order  density  of  only  one  of  the  outputs  will  be  of  interest. 
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Then  the  output  may  be  written  as  a  linear  combination  of  the  input 
random  variables,  i.e., 


m 


z(k,Jo) 


I  I 

i=l  j=0 


% 


Ug(k-j) 


(6.1) 


As  before,  it  is  assumed  that  the  u^,(k)  are  independent  and  have 
zero  mean  with  the  additional  assumption  that  for  a  fixed  £,  the  u^(k) 
are  identically  distributed  for  all  k.  Also  assume  that  for  each  4 
some  a^  is  not  equal  to  zero  (i.e.,  no  impulse  response  from  any  of 
the  inputs  to  the  output  is  equal  to  zero  for  all  time). 

A  theorem  will  now  be  proved  showing  that  the  cdf  of  the  output  of 
a  time- invariant  linear  filter  converges  to  a  gauss i an  cdf  if  and  only 
if  the  input  is  gauss i an. 


Theorem;  Let  a  stable,  time-invariant,  discrete  linear  dynamical 
system  and  its  input  be  described  by  Eq.  (6.1)  and  by  the  assumptions 
given  above.  Then  the  cdf  of  the  system  output  will  converge  almost 
everywhere  as  JQ  **  00  to  a  gaussian  cdf  if  and  only  if  the  input 
is  gaussian. 

Proof :  It  is  well  known  that  the  output  of  a  linear  system  is 
gaussian  if  the  input  is  gaussian.  So  only  the  "only  if"  proof 
will  be  given  here. 

It  is  assumed  that,  for  a  fixed  i,  all  u^(j)  are  identically 
distributed.  Then 


i  1  a2[\j  Ui(k-J>]  ■  1 

j=0  £=1  £=1 


l 

j  =0 


(6.2) 


If  the  system  is  stable, 
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* 

If  the  system  is  stable, 

Jo 

.  lim  V  a^  <  »  for  all  £ 
j  -*  00  zL  £j 

°  J=0 

Therefore, 

j  m 
o  o 

1 1  ■Vk-j)]  <*  (6-3) 

j*o  £«l 

l+i 

Define  Z^i(k,Jo) 

^(k.J,)  *  *(k»JD)  “  *£t  u^(k-i)  (6.4) 

Using  Eq.  (6.3)  and  Ref.  12,  page  236,  it  is  seen  that  z(k,Jo)  and 
Z^(k,J  )  converge  alaost  surely  as  JQ 

The  "Composition  and  Decomposition  Theorea"  [Ref.  12,  p.  271]  states 

that  the  sua  of  the  two  independent  random  variables  with  finite 

means  and  variances  is  gaussian  if  and  only  if  both  variables  in  the 

sua  are  gaussian.  Therefore,  z(k,J  )  and  z(k,°°)  are  gaussian 

o 

only  if  u^(k-i)  is  gaussian. 

At  this  point  it  is  sppropriate  to  briefly  examine  a  class  of  zero 
mean  distributions  with  finite  variances  that  have  the  interesting 


The  iapulse  response  of  a  discrete  systea  is  bounded.  If  it  is  not 
bounded,  a  bounded  input  (i.e.,  a  step  function)  will  give  an  unbounded 
output  ( som  of  the  coefficients  in  the  series  fora  of  the  output 
Z- transform  will  be  unbounded).  Of  course  if 


then 


2 

‘ij 


<  ®  . 
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property  of  being  invariant  as  they  are  passed  through  an  arbitrary 
discrete  linear  system.  Here,  "invariant"  means  only  a  change  in  scale 
factor. 

Definition:  A  Stable  Law  [Ref.  12,  p.  326],  The  cdf,  F(u),  is 

stable  if,  to  every  a.  >  0,  a„  >  0;  and  b. ,  there  correspond 

.  1  2  X 

constants  a^  >  0  and  b^  such  that 

T(a1VL£  +  bx)  *  F(a2u^  =  b2)  "  F(a3UiJ  +  b3^  (6.5) 

It  is  clear  the  invariant  distributions  belong  to  the  class  of  stable 
laws  since  it  is  required  that  they  do  not  change  with  an  arbitrary 
impulse  response  that  is  always  positive.  By  a  well-known  theorem  [Ref. 
12,  p.  327]  the  log  of  the  characteristic  function  of  a  stable  law  is 
given  by 

T*T 


log  f  (t)  -  ita  -  b|t|a  {l  +  ip 
i  1 


w(t,a)| 


(6.6) 


where 


and 


w(t,a) 


if  a  ^  i 
if  a  *  l 


(6.7) 


a  c  the  real  line 
b  6  (0,oo) 

o  <  a  s  2 
IP)  §  1 
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The  variance  of  u^(k)  may  be  found  by  taking  the  second  derivative 
with  respect  to  t  of  the  antilogarithm  of  Eq .  (6.6)  and  setting  t  =  0. 
This  derivative  is  less  then  infinity  only  when  a  =  2.  When  0!  =  2, 
u(t,a)  =  0.  Thus,  the  characteristic  function  of  invariant  cdf's  must  be 
of  the  normal  form 


fu  (t)  =  exp  (ita  -  b| tj  ) 


(6.8) 


C.  FINDING  HIE  JOINT  DENSITY  OF  THE  STATE  VARIABLE  AND  THE  OBSERVATIONS 
Since 

P[X(1) . X(k),Y(l) . Y(k)] 

*  P[X(1) . X(k)|Y(l) . Y(k)]  P[Y(l ) . Y(k)] 

the  x(l) , . . . ,X(k)  that  maximise  the  density  on  the  left  side  of  the 
above  equation  also  maximize  the  conditional  density  on  the  right.  The 
approach  used  here  will  be  to  first  calculate  in  a  convenient  form  the 
joint  density  of  the  state  variables  and  the  observations  for  nongaussisn 
inputs.  In  Chapter  III  this  was  relatively  simple  because  the  Joint 
density  of  gaussian  variables  is  gausslan,  but  for  nongaussisn  inputs, 
finding  the  Joint  density  is  really  the  heart  of  the  problem.  In  the 
next  section  methods  are  given  for  finding  the  density  maximum  (the 
optimum  estimate) . 

The  Joint  density  of  the  state  variables  alone  is 
P[X(l),X(2),...,X(k)]  -  P[X(l)]  P[X(2) | X(l ) ] 

•  P[ X( 3 )  |  X(l) ,X(2)]  ...  P[X(k)|X(l) . X(k)] 

(6.9) 

The  density  of  X(l)  is  Just  the  density  of  ru(0),  or 

p[x(i)]  -  p[ru(o)]  (e.io) 
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The  density  of  X(2)  conditioned  on  X(l)  is  the  density  of  TU(l) 
translated  by  OX(l).  Write  this  density  as 


P[X(2)|X(1)]  =  P[ru(  1 ) |  mean  =  «(l)]  (6.11) 


Because  of  the  Markov  property  of  X(i) 


P[X(3)|X(1)X(2)]  =  P[X(3)|X(2)]  =  P[u(2)  j  mean  =  <DX(2)]  (6.12) 


and 


P[x(k)|X(l) . X(k-l)]  =  P[ru(k-l)| mean  «  <Mt(k-l)]  (6.13) 


Since  the  elements  of  U(k-l)  are  independent,  it  ia  quite  simple 
to  write  down  the  right  side  of  Eq.  (6.13)  if  T  is  diagonal.  Then 


P[ru(k-l)|  mean  *  X(k-l)]  =  H  Pr7liu1(k-l)|mean  *  «i(k-l)]  (6.14) 

i 


where  $4(k)  is  the  ith  element  of  4>X(k). 


# 

By  this  notation,  we  mean  that  the  aero  mean  density  of  the  random 
variable  TU(l)  is  shifted  by  an  amount  equal  to  $X(l). 
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If  r  is  triangular,  there  is  dependence  between  the  elements  of 
the  noise  vector  added  to  4X(k-l)  and  therefore 


P[ru(k)|mean  =  4>X(k-l)]  ■  P[711u1(k-l)|mean  =  ^(k-l)] 


*  +  721Ul^k_1^ 


•  P[733u3(ll“l)!Bean  »  «3(k-l)  +  y^u^k-l) 


+  732U2^k-1^ 


•  P 


o  o  o 


♦„  (k-l) 


■  -1 
o 


*  I 


Ul 


■  P[x1(l)|«1(k)]  P[*a(k)|#2(k-l),*1(k)]  ... 

P[*m  (k)l%  . *s  -xOO]  (••«) 

Similar  relationships  may  be  written  if  r  can  be  partitioned  into 

# 

triangular  matrices,  or  if  the  rows  can  be  reordered  to  form  a  triangular 
matrix  or  a  matrix  that  has  triangular  partitions. 

• 

Here  the  matrices  that  are  to  be  partitioned  must  not  have  elements  of 
two  different  triangles  in  the  sms  row. 
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It  becomes  more  complex  if  no  such  partitions  are  possible.  For 


example,  suppose 


r  = 


711  712 


7  21  7  22 


(6.16) 


and  none  of  the  y  equals  zero .  Then 

4  J 


P[X(1)]=  jP^jU^O)]  *  P[712u2(0)]|  P[721u2(0) 

+  ^22U2(0)I711U1(0)  +  712U2(0)] 


(6.17) 


The  random  variable  7_  u  (o)  conditioned  on  7,,u, (0)  +  7,ou_(0)  is 

•i  1  11  1  1«  2 

not  independent  of  722ul^0^  conditioned  on  the  same  sum.  Therefore, 
convolution  cannot  be  used  to  find 


P[721U1{0)  +  722U2{0)I711U1(0)  +  712U2(0)1 

Consider  the  matrix  transformation 

Z(k)  «  2TlX(k) 


Then 


Z(k)  «  3T1*X(k-l)  +  H_1ru(k-1) 

»  H-15>H(k-l )  +  E_1ru(k-1)  (6.18a) 

and 

Y(k)  -  HEZ(k)  +  «(k)  (6.18b) 

are  the  equations  of  a  system  with  state  variable  Z(k)  having  a 
response  identical  to  that  of  the  system  described  by  the  equations 
(2.3a)  and  (2.3c).  If  P  is  nonsingular,  pick 

H-1  «  P"1  (8.19) 

Then  the  input  matrix  will  be  diagonal. 
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If  a  Bayes  estimate  of  Z(k)  is  made,  as  noted  in  Chapter  IV,  the 
Bayes  estimate  of  X(k)  is 


i 


x(k)  »Hz(k) 


(6.20) 


Since  the  Jacobian  of  a  matrix  transformation  does  not  depend  on  the 
multiplying  vector  [Ref.  13],  the  transform  of  the  maximum  likelihood 
estimate  of  Z(k)  will  maximize  the  conditional  density  of  X(k).  So 
Eq.  (6.20)  is  also  true  for  this  latter  type  of  estimate. 

If  T  is  singular,  then  a  nonsingular  H  1  can  be  found  that,  when 
multiplied  with  T,  gives  a  triangular  product.  This  is  best  illustrated 
by  an  example.  Let 


12 


01 


r 


and  let 


where  J1’*  is  the  ijth  element  of  ST1 . 

The  first  column  of  B  can  be  arbitrary  and  the  last  column  is 

o 

obviously  all  zero.  The  elements  b_„  and  b__  can  also  be  arbitrary, 

11  **  12  4* 

but  b  j  must  be  zero.  Then  5  and  i  must  be  picked  so  that 


«  o 


All  the  other  t1^  may  be  chosen  to  make  H  1  nonsingular. 

1 he  Joint  density  of  the  state  variables  will  now  be  used  to  derive 
the  joint  densities  of  the  state  variables  with "the  observations.  Two 
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cases  will  be  considered.  The  first  procedure  will  use  convolution  of 
two  independent  densities,  and  is  not  applicable  to  all  forms  of  dynamical 
systems.  The  second  is  more  general  but  in  some  cases  requires  more 
calculation. 

Case  I:  H  (a  vector)  is  different  from  zero  in  one  dimension  only, 
or  H  (a  matrix)  is  diagonal. 

Any  dynamical  system  with  a  scalar  output  may  be  put  in  a  form  where 
the  output  is  |iX^(k)  where  p  is  some  constant.  For  the  moment 
we  will  consider  only  these  systems,  or  those  where  H  is  a  diagonal 
matrix.  If  the  scalar  or  an  element  of  a  vector  output  is  a  linear  com¬ 
bination  of  several  of  the  x^(k)'s,  convolution  usually  cannot  be  used 
to  find  the  probability  of  the  linear  combination  conditioned  on  the 
x»(j),  for  j  less  than  k.  This  is  true  because  ,  in  general ,  the 
conditioned  x^(k)  will  not  be  independent. 

For  convenience,  the  treatment  in  Case  I  will  consider  scalar  outputs 
only.  Extensions  to  vector  outputs  will  be  obvious.  Since  P[X(l) , . . . ,X(k)] 
is  really  a  joint  density,  i.e., 

p{[xi(l),...,xjB(l)]  ...  [x1(k),...  .x^k)]} 
o  o 

and  W(l)  is  Independent  of  x^(l),  one  can  write 

P[  Y(l )  |  Xg(l )  t  •  •  •  »*B(k ))  (^)  ] 

o  o 

«  P[W(1)]  *  P[xl(l)|x2(l) . xjk)]  p[*2(l) . *B001 

o  o 

■P[W(1)1  *  P[x  (l),x2(l),...fxB(k)]  (6.21) 

o 

By  successive  convolution,  the  joint  density 

...,  Y(k-1 ) , . . , ,x^(k-l ) 
o 

x1(k),...,x||(k)] 


P(Y(l),x2(l),...,xa(l),  Y(a),xa(2),...,x  (2), 
o  o 
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may  be  found.  If  desired,  all  x^(j),  J  less  than  k,  may  be 
eliminated  by  taking  the  marginal  distribution  giving 

P[Y{1) ,Y(2) .... ,Y(k-l) ,x. (k) . xm(k)] 

1  “o 

Then,  note  that 

P[X(k)| Y(l) ,Y(2) .... ,Y(k)]  -  P[Y(k)|x(k)]P[x1(k) . xn(k) ,Y(k-l) . Y(l)] 

o 

-  P[w(k) | mean  -  HX(k)] 

*  P[Xx(k) . xm(k),Y(k-l) . Y(l)]  (0.22) 

o 

Equation  (6.22)  must  be  used  because  another  convolution  would  eliminate 
x^(k).  The  convolution  usually  can  be  written  down  by  inspection. 

Case  II:  There  is  no  restriction  on  H. 

The  rsmsinder  of  the  derivation  here  is  very  simple. 

P[Y(1),X(1) . X(k)]  -  P[Y(l)|x(l)]P[X(l), . . , ,X(k) 

P[Y(1),Y(2),X(1) . X(k)]  -  P[Y(2)|X(2)]P[Y(1),X(1) . X(k)] 

P[Y(1) . Y(k)  ,X(1) . X(k)]  »  P[Y(k)|  X(k)]P(Y(l) , . . .  ,Y(k-l)  ,X(l ) . X(k)] 

k 

-P[X(l) . X(k)]  [[  P[Y(2)|X(J)) 

j-1  (6.23) 

where 

P[Y(J)|X(J)3  -  P(W(j)|mean  -  HX(J)]  (6.24) 

Notice  that  there  are  more  x^(j)  m  Iq.  (6.24)  than  in  Kq.  (6.22),  and 
that  there  will  be  more  Integrations  if  the  marginal  densities  are  found. 
Then  Iq.  (6.22)  should  be  used  if  the  dynamical  system  can  be  described 
as  in  Case  1  and  if  marginal  densities  are  to  be  taken. 
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One  other  density  will  be  needed  in  the  next  section.  Assume  that 

the  system  is  of  the  type  considered  in  Case  I  but  that  P[Y(l) . Y(j), 

X(l) , . . . ,X(j ) ]  has  been  calculated  by  the  procedures  outlined  in  Case  II. 

P[Y(1) . Y(k),X(l),...,X(j)]  =  P[Y(1),...,Y(J),X(1) . X(j) 


•  P[Y(j+l)|X(j)]P[Y(j+2)|X(j),Y(j+l)]  ... 

•  P[Y(k)jx(j),Y(j+l),...,Y(k-l)]  (6.25) 


Note  that 


i-1 

X( j+i )  »  64X(j)  +  ^  <j>rru(j+r) 
r-0 


(6.26) 


The  following  set  of  equations  can  be  written: 


Y(J+1)  -  xx(j+l)  +  W(J+1) 


-  («n . ^JxU)  +  (7xl . 71b)U(J)  +  W(J+D 

o  o 

Y( J+2)  -  xx(j+2)  +  W(j+2) 

-  ♦  . . . 

O  0 


♦  (7n>""7lu  )HI(J+1)  ♦  W(j+2) 


Y(j+o  -  . Ji})x(j)  ♦  (•ii“;....f^'*#)ni(j)  ♦ 


(i-1)  (i-l)>T 


♦  (7n . 7llt)W(J*i-l)  ♦  »(>0 

o 


where  « 


^  is  the  gtth  element  of  6*. 


(6.27) 
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Note  that 


P[Y(j+i)|X( j)  ,Y( J+l) . Y(j+i-l)  ,U(  j) . U( j+i-l)] 

-  P[Y( j+i ) | X( j ) ,U( J ) , . . . ,U( j+i-1 ) ] 

-  P[w( j+i)|nean  -  . ♦I^)x(j) . ( 7 u . 7la)U( J+i-1 )] 

0  o 

(6.28) 

So,  using  all  the  lines  of  Eqs.  (6.27)  and  (6.28),  we  nay  find 

P[Y( j+l) , . . . ,Y( j+i ) ,U( j ) . U(j+i-l)| X( J)] 

i 

-  P[U(j)]P(U(j+l)]  ...  P[U(j+i-l))  [f  P[Y(j+n)|X(j),U(j) . u(j+n-l)] 

““1  (6.2P) 


D.  FINDING  THE  ESTIMATE 

This  section  contains  a  brief  description  of  several  Methods  for 
calculating  the  aode  of  the  joint  densities  found  by  Methods  given  in 
the  preceding  section.  The  probleM  is  to  find  the  Bayes  estinates  of 

X(l ) . x(k)  froM  the  observed  data  Y(l ) , . . . ,Y(k).  First,  several 

obaervations  are  Made  about  the  convergence  of  Bayes  estinates  of  llnear- 
dynaaic- systea  state  variables.  It  is  obvious  that  the  ne an- squared 
error  of  X(j)  will  converge  to  soae  value  as  the  nunber  of  observations 
Increases,  and  that  the  ae an- squared  error  aust  be  a  nonincreasing  function 
of  (k-j),  where  the  estiaate  is  conditioned  on  Y(l)  to  Y(k).  If  the 
(k+l)th  observation  increased  the  aean  square,  then  the  Bayes  estiaator 
would  ignore  Y(k+l).  Since  the  swan-squared  error  is  never  negative, 
it  aust  approach  a  Halt  ps  (k-j)  -*  00. 

If  the  estiaator  is  linear  and  the  noise  is  white  and  stationary, 
the  estiaator  will  be  stable.  For  exaaple,  let  the  noise  be  scalar. 

Then,  the  coaponent  of  the  aean- squared  error  of  x^(j)  due  to  the 
noise  will  have  the  fora 
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k 


i=l 


where  a^,(i)  is  the  value  of  the  estimator  input  to  x^(j)  output 
impulse  response  at  time  equal  to  i.  Since  the  sum  is  finite  (because 
the  Bayes  error  is  finite),  the  estimator  is  stable.  Similar  reasoning 
shows  that  a  linear  Bayes  filter  and  a  Bayes  predictor  are  also  stable. 

In  Chapter  III  there  was  an  example  of  a  linear  smoother  that  obtained 
near-optimum  performance  with  a  small  number  of  observations.  In  general, 
nonlinear  estimators  also  obtain  near-optimum  performance  with  a  truncated 
sequence  of  Y(k). 

It  will  now  be  shov:n  that  the  desired  length  of  this  truncated 
sequence  can  be  directly  related  to  the  rate  of  decay  of  an  initial 
condition  in  the  generating  dynasdcal  system. 

The  optimum  estimate  may  be  written  as 


xU)  *  J  *(J)  rtx(j)|»(D . TOO]  <u(i) 

x(4> 

•/  /  /  *(J)  K*(j)|rU) . r(k) ,u( j ) . u(k-i)] 

0(4)  O(k-l)  X(j) 

•  *tu(j) . U(k-i)|Y(l) . T(k)]  dX(4)  <10(4)  ...  dC(k-l) 

*  j  ...  j  *[X(4)|Y(1) . Y(k),c(4) . u(k-l)]  Y[0(4) . 0(k-l)| Y(l) . Y(k)]  dl>(4)  ...  au(k-l) 

0(4)  «(k-l) 


(6.30) 


Then,  the  magnitude  of  the  change  in  X(j),  when  it  is  conditioned  on 
one  more  observation,  Y(k+1),  is  less  than 
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max  max  E[X(j)|Y(l) . Y(k+l),U(j) . u(k)] 

U(j, . U(k)  ,Y(l ) . Y(k+1 ) 

-  E[X(j)|Y(l) . Y(k),U(j) . U(k-l)] 

The  value  of  E[X(l)jY(l) . Y(k+1 ) ,U( j ) , . . . ,U(k ) ]  may  be  found 

by  solving  the  set  of  simultaneous  equations 

P[X(j),Y(l) . Y(k+l),U(j) . U(k)]  -  0 

P[X(j),Y(l) . Y(k+l),U(j),...,U(k)]  =0  (6.31) 

“n 

Consider  the  i!th  of  those  equations.  From  Eq.  (6.29), 
dx  JT)  P[X(j)'Y(l) . Y(k+l),U(j) . U(k)] 

Mi 

k  k-j 

=  J]  P[u(i)]P(X(j),Y(l) . Y(  J  )  ]  !;  P[  Y(  J+  i)  |  X(  J  )U(  j ) , . . .  ,U(  J+i-1 )  ] 

i=j  i*l 

•  P[Y(k+l)|X(j)U(j) . U(k)] 

+  P[Y(k+l)|X(j),U(j) . U(k)]  ^  P[X(J),Y(1) . Y(j  )] 

a 

•  F  P[Y(j+i)|X(j),U(j) . U(j+i)]  *0  (6.32) 

1=1 

*  * 

From  Eq.  (6.29),  it  is  seen  that  a  change  in  X(j)  of  AX(j)  in 
P[Y(k)|x(j),U(j),... ,U(k-l)]  changes  only  the  mean  of  the  conditional 
density  by  an  amount  proportional  to 

•••  <J,H> 
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In  any  stable  dynamical  system, 


as  k-j  -»  oo.  in  practice,  it  is  found  that  this  norm  goes  to  zero  very 
quickly.  For  example,  if  a  and  b  in  the  system  of  Fig.  1  both  equal 
l/2,  for  k-J  =  8, 

(♦u)‘i5))  “<•»  -  135?  <J>  ♦  315  <6-33> 

Then,  the  magnitude  of  the  component  of  the  gradient  of  P[Y(k)|x(j), 

ll(k ) . U(k-l)],  with  respect  to  X(j)  in  the  x^^  direction,  will  be 

l/l024  the  maximum  magnitude  of  the  slope  of  P[W(k)]  in  that  direction. 
Clearly,  a  good  engineering  approximation  is  to  set  the  first  term  in 
the  sum  in  Eq.  (8.32)  equal  to  zero,  then  the  left  side  of  Eq.  (6.32) 
is  equal  to  zero- at  exactly  the  same  places 

P[X(J),Y(1) . Y(k),u(j) . U(k-l)  =  0 

i 

There  will  be  no  change  in  the  conditional  expected  value,  and  a  truncated 
sequence  gives  very  nearly  the  optimum  estimate. 

When  actually  obtaining  the  estimates,  it  is  usually  convenient  to 
use  the  logarithm  of  the  Joint  density.  From  Eqs.  (6.23)  and  (6.25), 

in{p[Y(l),...,Y(k),X(l) . X(J)]|  *  &i{P[X(l)]f  +  in{P[Y(l)|X(l)]|  ♦  ... 

+  £nJP[X(j)|X(j-l)]|  +  in|P[Y(j)|X(j)]| 

♦  in{P[Y(j+l) . Y(k)|X(j)]|  (6.34) 

Taking  the  derivative  of  Eq,  (6.34)  with  respect  to  x^(j)  *or  ^ 

gives  a  set  of  mQ  equations  of  the  form 

3  <n{P[X(  j ) !  X(  j-1 )  ]}  3inLPlYmi,XLl,li  d  in<P[Y(j>l),....Y(k)lX(j)]}  = 

dx.-U) 

1  1  1  (6.35) 
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If  the  Bayes  3(j-l),  given  Y(l) . Y(k),  is  known,  then  x(j)  is 

the  only  variable  in  Eq.  (6.35);  i.e.,  the  X(j-l)  that  maximizes  (or 
minimizes) 

P[X(1) . X(j-l),Y(l),...,Y(k)] 


also  maximizes 


P[X(l),...,X(j-l),X(j),Y(l),...,Y(k)] 


Ihis  can  be  seen  from  the  following  argument.  The  Bayes  estimate  of 
X(j-l)  obviously  maximizes  P[X(l) , . . . ,X(j-l) ,Y(l) . Y(k)],  and  the 

Bayes  estimate  of 


maximizes  P[X(l) . X(j),Y(l) . Y(j)] .  The  first  n  elements  of 

f}(j)  are  the  elements  of  X(j-l). 

If  surf ace- searching  methods  are  used  to  solve  Eq.  (6.35),  the 
surface  has  dimension  a When  £(j)  is  found,  it  is  put  in  the  like¬ 
lihood  functions  for  X(j+l),  and  the  process  is  repeated  until  all  k 
of  the  X( J )  are  estimated.  The  first  surface  search  [in  the  estimation 
of  X(l ) ]  will  use  the  initial  condition,  X(0). 

If  only  filtering  is  desired,  the  maximum  likelihood  estimate  may 
be  made  by  using  the  density  described  in  the  last  paragraph  of  Sec.  C, 
or  surface  searching' may  be  used  to  solve  equations  of  the  form 


a  lnlP[X(j)|Y(l)^...,T(j)]}  ,  a  in{p^Y(|)|X(j)jL  p  (6 >36) 


notice  that  P[X(j)|  Y(l), . . .  ,Y(j)]  has  the  property  seen  in  Chapter 
III.  The  observed  Y(l)  will  appear  only  in  the  mean,  and  the  other 
moments  do  not  depend  on  Y(l).  The  surface  to  be  searched  is  again  of 
dimension 
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Also,  the  density  necessary  for  maximum  likelihood  smoothing  with 
earlier  data  may  be  obtained  by  writing 

P[X(j),Y(l) . Y(k)]  =  P[x(  j )  |  Y  ( 1 ) . Y(j-l)  P[Y(  j  )  |  X(j  )  ] 

•  P[Y(j+l),...,Y(k)|X(j)3  (6.37) 

The  surface  is  also  m  -dimensional. 

o 

If  linear  estimation  of  the  X(j)  are  made,  this  should  be  a  good 
first  guess  for  the  surface  search.  In  multiple-mode  problems  with 
reasonably  good  signal-to-noise  ratios,  it  would  be  an  aid  in  starting 
on  the  correct  mode. 

If  the  dimension  of  the  input  to  the  dynamical  system  is  smaller  than 
the  dimension  of  the  state  vector,  it  may  be  simpler  to  use  Eq.  (6.28) 
directly  and  estimate  the  U(k).  This  would  reduce  the  dimensions  of 
the  surface  to  be  searched,  and  there  would  be  no  need  for  taking  marginal 
distributions.  Then,  the  optimum  estimates  of  the  X(i)  are  given  by 

X(2)  =  4>X(l)  +  0(1) 


X(j)  =  OX(J-I)  +  U(j-l) 

where  X(l)  is  the  known  initial  condition. 

Another  calculation  aid  is  the  fact  that,  with  many  input  densities 
and  many  dynamical  systems,  P[Y(k ) | X(j ) ,Y( j+1 ) , . . . ,Y(k-l ) ]  will  appear 
very  gaussian  if  k-j  is  greater  than  4  or  5  and  if  W(k)  is  gaussian. 
This  is  especially  true  for  moderately  high  signals  (see  the  next  section), 
or  very  high  gaussian  output  noise.  Then  the  methods  of  Chapter  III  may 
be  used  to  calculate  the  conditional  density. 

Several  short  cosments  are  offered  on  finding  the  mode  when  the 
solution  is  not  put  in  the  form  of  Eq.  (6.35).  For  example,  perhaps 
only  a  small  number  of  data  points  are  available  and  the  joint  density 
is  put  into  the  form  of  Eq.  (6.23).  If  none  of  the  x^(j)  i*  integrated 
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out,  and  a  surface  search  is  run  over  all  the  X(j),  it  is  often  helpful 
to  note  that  the  mode  will  be  at  the  origin  for  all  Y(j)  equal  to  zero 
and  P[u^(j )]  symmetrical  about  zero  [W(j)  is  assumed  gaussian]. 

Then  the  Y(j)  can  be  moved  out  toward  their  observed  values  in  small 
steps,  thus  moving  the  mode  in  small  steps  from  a  known  position.  The 
surface  search  will  be  over  a  much  smaller  area  and  the  total  calculation 
will  be  greatly  reduced.  If  there  are  multiple  modes,  the  correct  one 
will  be  tracked. 

Once  Eqs.  (6.23),  (6.28),  or  (6.34)  have  been  found,  the  surface 
search  is  a  routine  problem  for  the  computer  programmer.  A  common 
procedure  is  the  method  of  steepest  descent  (or  ascent).  The  gradient 
is  evaluated  at  a  point  and  a  move  is  made  in  the  direction  of  the 
gradient  to  a  new  point.  The  gradient  is  again  evaluated  and  the  process 
continued  until  the  gradient  magnitude  is  sautll.  Gradients  of  either 
the  log  of  the  density  or  the  density  itself  may  be  used.  Another 
common  procedure  is  to  move  along  a  coordinate  axis  until  there  is  no 
increase  in  the  density.  This  process  is  repeated  in  turn  on  each  axis 
until  the  mode  is  reached. 

E.  THE  ASYMPTOTIC  BEHAVIOR  OF  THE  ESTIMATORS  AS  THE  SIGKAL-TO-NOISE 

RATIO  IS  INCREASED 

This  section  contains  a  discussion  of  the  asymptotic  behavior  of 
estimates  for  systems  with  nongauss i an  inputs  as  the  signal- to- noise 
ratio  is  increased.  It  will  be  shown  that  linear  estimation  is  very 
nearly  optimum  for  high  signal- to-noise  ratios  for  one. or  more  states 
of  the  dynamical  system.  In  special  cases,  it  can  be  shown  to  be  nearly 
optimum  for  all  states  of  the  system.  This  property  of  estimators 
assumes  special  Importance  because,  in  the  majority  of  cases,  there 
probably. will  be  a  strong  signal  and  weak  noise. 

Denote  by  X  (j)  all  elements  of  X(j)  that  nay  be  measured  directly 
o 

or  calculated  exactly  knowing  Y(l) , . . . ,Y(k) ,  k  2  J,  when  the  covariance 
matrix  of  W  is  zero.  Using  the  techniques  of  Sec.  C,.  one  can  find 

P[Xo(j),Y(l) . Y(k)].  Let  S^(j)  be  the  elements  of  the  Xq(j) 

estimation,  XQ(j).  How.  take  a  multidimensional  Taylor  series  expansion 
of  £n{p[Xo(j),Y(l) . Y(k)]}  about  SQ(j). 
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£n{P[Xo(j  ),Y(1 ),...,  Y(k)]|  =  in{P[Xo(j),Y(l) . Y(k)]J 

d2  in  jp[X  ( J )  ,Y(l ) , . . .  ,Y(k )  ]}| 


♦12 

g.h 


c*x  (j  )dx  ( j) 

g  h 


[xg(j)  -  ^g(j)][xh(j)  -  fch(j)l 


xg(j)  ■  *g(j) 


xh(j)  =  \(j) 


+  higher  order  terms 


(6.38) 


Notice  that  the  term  containing  the  first  partial  derivatives  is 
zero  because  these  derivatives  are  evaluated  at  the  mode.  Let 


d2  in{p[X  (j)  ,Y(l),...,Y(k)]jj 


dx  (j)dx  (j) 

g  h 


VJ) 


VJ) 


■  *,(j) 
-  VJ) 


Then  for  small  lx  (jj  -  ft  (j)| 
g  g 

P[X  (j),Y(l) . Y(k)]  s  P[X  (j),Y(l),...,Y(k)] 

o  o 


exp 


{-  i  2  Vx«lj) 

'  gih 


«g(j)][xh(j)  -  fth(j)]J 
(6.39) 


Or,  over  a  small  region  about  the  mode, 

* 


P[Xo(j)|Y(l) . Y(k)] 


follows 


the  gaussian  density. 

By  picking  the  elements  of  arbitrarily  small  and  using 

Tchebysheff ’s  inequality,  |x^(j)  -  ftg(j)|  may  be  bound  as  small  as 
desired  with  any  probability  less  than  one.  So,  with  high  probability, 
Eq.  (6.39)  will  describe  the  joint  density.  Under  these  conditions, 
linear  estimation  is  very  nearly  optimum. 


This  treatment  was  suggested  by  the  method  used  in  Ref.  14. 
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VII.  CONCLUSION 


A.  SUMMARY 

A  solution  has  been  given  for  the  problem  of  filtering  a  stationary 
process  with  unknown  parameters  when  the  parameters  come  from  some 
infinite  set.  It  has  been  shown  that  the  optimum  estimator  weights  are 
averages  of  functions  of  the  parameters  where  the  averages  are  taken 
with  respect  to  the  parameter  space  conditioned  on  the  observations. 

It  is  seen  that,  for  stationary  or  near- stationary  processes,  the 
parameters  enter  into  the  solution  in  the  form  of  simple  functions  of 
the  signal  and  noise  spectrum.  Optimum  methods  for  measuring  the  spectral 
averages  were  given. 

The  theory  of  state-variable  estimation  has  been  extended  to  include 
nonlinear  estimation  for.  nongaussian  inputs.  First,  it  was  shown  that, 
even  in  steady  state,  the  state  variables  are  nongaussian  for  nongaussian 
inputs  if  the  discrete  system  is  stable  (a  bounded  output  if  the  input 
is  bounded).  Thus ,  in  general,  linear  estimation  is  optimum  only  for 
gaussian  inputs,  the  estimation  approach  used  was  to  find  the  mode  of 
the  state-variable  density  conditioned  on  the  observations,  in  which  the 
key  problem  was  to  obtain  the  density  in  a  convenient  form.  The  Markov 
property  of  the  state  variables  was  used  to  simplify  this  very  complicated 
density,  and  surface- searching  methods  were  used  to  find  the  maximum.  It 
has  been  shown  that  near  optimum  (linear  or  nonlinear)  estimates  may  be 
made  of  the  state  of  many  dynamical  systems  using  only  a  short  sequence 
of  observations,  and  the  length  of  this  sequence  may  be  related  directly 
to  the  rate  of  decay  of  initial  conditions  in  the  dynamical  system. 

A  new  approach  to  linear  estimation  of  state  variables  has  also  been 
presented.  It  shows  the  close  relationship  between  the  theory  of  pattern 
recognition  in  a  random  environment  and  state-variable  estimation.  The 
theory  is  a  straightforward  extension  of  Abramson  and  Bravenaan's  learning 
theory  [Ref.  7],  Their  mean  and  covariance  matrix  equations  are  almost 
Identical  to  the  filtering  equations  presented  here.  The  theory  of 
estimation  for  the  case  where  the  observations  are  taken  through  a  second 
dynamical  system  was  presented  and  a  suitable  solution  was  given. 
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The  linear  estimation  theory  was  used  to  simplify  the  procedure  for 
detecting  a  gaussian  signal  in  gaussian  noise.  The  optimum  detector 
contains  an  operator  that  gives  the  best  estimate  of  all  sample  values 
during  the  detection  interval.  Ordinarily,  a  matrix  of  order  equal  to 
the  number  of  observation  sample  values  must  be  inverted.  It  was  shown 
that  with  state-variable  estimation  these  signal  estimates  could  be  made 
by  inverting  small  matrices  of  a  fixed  order  independent  from  the  number 
of  observations.  A  near-optimum,  time- invariant  detector  was  derived. 

B.  SUGGESTIONS  FOR  FUTURE  WORK 

Cox  [Ref.  6]  has  investigated  state-variable  estimation  when  the 
system  is  nonlinear  and  the  inputs  are  gaussian.  By  contrast,  the  present 
study  discusses  estimation  when  the  system  is  linear  and  the  inputs  are 
nongaussian.  An  obvious  area  for  further  work  is  estimation  when  the 
system  is  nonlinear  and  the  inputs  are  nongaussian.  An  approach  similar 
to  Chapter  VI  might  be  fruitful. 

A  problem  that  is  considered  by  the  author  to  be  one  of  the  more 
important  unsolved  practical  problems  is  the  detection  of  a  signal  when 
the  signal  parameters  are  unknown  and  drawn  from  an  infinite  set.  This 
is  the  type  of  signal  that  is  encountered  in  practically  all  space 
communications.  The  receiver  is  not  interested  in  the  exact  signal 
or  signal  parameter,  but  only  in  knowing  if  the  signal  is  present. 
Intuitively,  it  is  felt  that  a  form  of  a  near  optimum  detector  would 
correlate  the  signal  and  noise  with  the  output  of  the  adaptive  filter  of 
Chapter  V.  This  remains  to  be  proven,  however.  This  work  should  then 
be  extended  to  include  two  signals  transmitting  binary  information, 
where  the  parameters  are  drawn  from  two  Infinite  but  not  necessarily 
disjoint  sets.  An  example  would  be  frequency-shift  keying  when  the 
doppler  shift  was  greater  than  the  keying  shift. 
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APPENDIX  A.  DERIVATION  OF  THE  SMOOTHING  EQUATIONS 


This  appendix  gives  the  details  of  the  derivation  of  the  smoothing 
equations.  It  is  desired  to  find  the  solution  of 

gradxdjlin  P[X(l)|  Y(l) , . . .  ,Y(k-l)] 

+  «r*dX(l)  in  prY(k)|x(l),Y(l) . Y(k-l)]}  *  0 

(3.17c) 

The  density  P[Y(k)| X(l) ,Y(l) , . . . ,Y(k-l)]  may  be  specified  by  its  mean 
and  covariance  matrix.  From  Eq.  (2.3a)  it  is  seen  that  E[x(k)|x(l), 

Y(l) . Y(k-l)]  is  equal  to  «[X(k-l)|  X(l)  ,Y(l) . Y(k-l)],  since 

W(k-l)  has  zero  mean.  From  Eq.  (2.3c)  E[Y(k)|X(l),Y(l) . Y(k-l)] 

is  equal  to  H<&E[X(k-l)|X(l),Y(k),. . .  ,Y(k-l)]  since  U(k)  also  has 
zero  mean.  The  solution  to  the  filtering  problem,  Eq.  (3.8),  may  be 

used  to  find  E[X(k-l)|X(l),Y(l) . Y(k-l)]  with  X(l)  substituted 

for  X(l)  as  the  initial  condition  in  Eq.  (3.14),  i.e.,  for  k  »  2, 

*(a)  .  [h*  sfa)a  ♦  «ij2)]  1  [«*  S(2)v(»)  *  *i(t)**U>]  <*•»> 

where 

KX(2)  *  rKu(l)  *  (A. 2) 

The  matrix  *x(k)  w“  *lv*n  b?  **>•  (3-6)  for  k  >  2.  % 

To  si^>lify  notation,  Eq.  (3.8)  can  be  written  as 

X(k-l)  -  AklT(k-l)  +  Bk-ll(k-2)  (A. 3) 

Then,  iterating  on  Eq.  (A. 3),  with  (A.l)  as  a  start,  write  the  relations 

X(2)  .  A2Y(2)  ♦  B2X(1) 
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*(3) 

2(4) 


A3Y(3)  +  B3A2Y(2)  +  B3B2X(1) 


=  A4Y(4)  +  B4A3Y(3)  +  B4B3A2Y(2)  +  B^B^l) 


=  X(4) 


X(l)-0 


+  B4B3B2X(1) 


More  generally, 


where 


X(k-l)  =  X(k-l) 


x(i)*o 


+  ck-lx(1) 


(A. 4) 


Ck-1 


=  B  B 
k-lk-2 


(A. 5) 


and  where  X{k-l)j ia  the  estimate  of  X(k-l)  from  the  filter 
equations  when  X(l)  is  zero. 

It  can  easily  be  seen  that  the  covariance  matrix  of  Y(k)  given  all 
observations  to  time  k-1  is 

Kr(fc)<klk-l)  ‘  "[“xu-i/  *  r*o<K-i)  rt]at  *  S(k)  <*•«> 

Then 

grad  £njp[Y(k)|X(l),Y(l),...,Y(k-l)]} 

-  -ac*^**  Ki(k)(klk“l)  tY(k)  "  H**(k”l)l 

This  gradient  is  added  to  the  gradient  of  £n{p[X(l)| Y(l) , . . . ,Y(k-l)]| 
and  the  susi  is  set  equal  to  zero  to  get  a  solution  for  X(l).  The 
equation  to  be  solved  has  the  form 

Vis<*>  -  Vi  *  <-/  ■'  S(k)<klk-‘>  "“vi*'1’ 

‘  <-/  »'  Kv(k)Ckl k_li  "*i(k-1)|x(D-. 

-  a*  «;}k)(k|k-l)  Y(k)  -  0  (A. 7) 
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from  which 


S<»  -  [<-!»'  »'  S(k)<klk-l>  »«k-l  *  §k-l]  1 

•  (°k-/  Ki(k)<l‘lk-1)|,(k)  -  "»*(k-1>|x(l)rf)  *  Vl] 


(A. 8) 


The  quantity  §  k  ^  i«  the  part  of  the  gradient  of  the  exponent  of 


P[X(1)]  ...  P[Y(k-l|X(l),Y(l),...,Y(k-2)] 


that  is  multiplied  by  X(l),  and  is  the  part  of  the  same  gradient 

that  is  not  Multiplied  by  X(l).  For  example, 


KxSl) 


*  *v[v 


2) 


Hr*0(2)r 


V]  1H4> 


(A. 9) 


and 

J2  -  »v[s(2)  *  anSI(2)rV]'lj<a)  *  atS(i)  *  S(.)i(1)  (A-10) 

More  generally, 

§k  -  [<-!»'» tS(1.)(klk-l)  “Vl  *  Vl]  (A'U) 


and 


Jk  *  [Ck-i*,"tS(k)(klk-1>  {?<k)  -  ■«(k-1)|x(1)-)|*  Vi]  <A‘ia) 

If  data  before  Y(l)  are  used,  Sq.  (3.19)  can  be  Modified  to 


*(l(l)|fc(0)tT(l),T(*) . TVk).T(0),. ..,»(-■)] 

.  f[X(l)|£(0),T(0) . ▼(-«)]  HT(1)|X(1),1(0),T(0) . T(-»)l  K8(0),T(0) . T(-»)] 

.  f[T(l)l  1(1 )  ,T(I)  .T(0) . T(-.J1  ...  >[t(k)l»(l).T(l).T(i) . TU-»),T(0),,;...TC-aii 

»{8(0),T(1> . T(k),T(0) . T(-»)1 

(A. 13) 
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Note  that 


P[X(l)|x(0),Y(0) . Y(-n)]  *  P[x(l)|x(0)]  (A. 14) 

P[Y(l ) |  X(l ) ,X(0) ,Y(0) , . . . ,Y(-n) ]  -  P[Y(l) | X(l ) ]  (A. 15) 

Also, 

P[Y(k)|x(l),Y(l),Y(2),...,Y(k-l),Y(0) . Y(-n)] 

has  mean  H4>X[k-l|  Y(-n) , . . .  ,Y(k) ;  X(-n-l)  »  0] ,  where  X(k-l)  is  the 
mean  given  in  Eqs.  (A.l),  (A. 2),  and  (A. 3)  [this  can  be  seen  by  the 
same  reasoning  as  that  used  to  derive  Eqs.  (A.l),  (A. 2),  and  (A. 3)].  The 
covariance  matrix  was  given  by  Eq.  (A. 6).  Thus,  the  only  term  of  the 
gradient  in  Eq.  (3.17)  that  will  be  changed  will  be  the  term  due  to 

p[x( 1 ) | x(o) ] . 


APPENDIX  B.  THE  CHANGE  IN  THE  DENSITY  OF  THE  CORRELATOR  OUTPUT 


— 577 

This  appendix  presents  the  change  with  Ae  (i)  of  the  correlation 

output  density  of  the  near-optimum  detector  of  Chapter  IV,  and  gives 

a  method  for  answering  the  very  important  question  of  how  small  to  make 
— 2 

Ae  .  The  law  of  large  numbers  shows  that  the  correlator  density  is  very 
nearly  gaussian  for  large  n  so  that  only  the  change  in  the  mean  and 
variance  need  be  discussed. 

Before  proceeding  further,  we  illustrate  the  well-known  fact  that 
error  is  uncorrelated  with  and  hence  independent  of  the  estimate.  From 
Eq.  (4.31), 


e2(i)  -  Rx(0)  -  [A*l,]t  *X+WA(1)  (B.l) 


I 


Since 


xx(i)  ■  fij(i)  +  e(i) 


(B.3) 


e2(i)  -  xj(i)  -  *J(i)  -  ast^i)  e(i)]  (B.3) 

However, 

2^(0  -  [A(l))t  Rx>wA(i)  (B.4) 

and  therefore 

E[x^(i)  e(i)]  »  0  (B.3) 
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-x%  ■ 


1.  Change  of  the  Correlator  Mean  with  Aa  (i) 
The  output  of  the  correlator  la 

n-m 


c(n)  s  W--&)  S  *1(J)  Y(J) 

Jan 


Using  Eq.  (B.S),  one  obtains 


BlijU)  Y(i)]  -  Efx^DjxjU)  +  e(i)  +  W(i)|] 
a  E[Xj(i)]  +  E[w(i)  ^(1)1 


(B.6) 


(B.7) 


Since  the  signal  and  noise  are  independent, 


E[w(i)  SjU)] 


»(i)  £  aij  + 

J 


-  au  E[W4(i)] 


(B.8) 


Frou  the  first  theoren  in  Chapter  IV  it  is  possible  to  bound  the 
change  in  1*^1  **  ■  Increases.  It  is  also  known  that  the  change  in 

I[5*(i)]  will  be  less  than  As  .  Then  the  change  in  the  naan  of  C(n) 
can  be  bounded. 


2.  The  Change  in  Variance  with  As  (i) 
Write 

>a- 


Yn}  ]  -  Rx+w  M(l) 


I  I  *ik  **ij  **♦*<*“*>  <B’9> 

k  J 
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Since  llRx+w(t)ll  18  m*LXimum  when  t  equals  zero,  the  sum  is  maximized 
for  a  constant  ||AA|!  by  letting  AA  be  different  from  zero  at  only  one 
place,  i.e.  , 


^ij  "  4  M* 


(i). 


(B.10) 


Then 


max 


2 

■  [{[M(lV  Yn|  ]  §  ||M(i)||2  Rx+W(0)  (B.ll) 


Now  ||  [M'  ']  Y  ||  can  be  specified  to  be  less  than  some  small 
2  n 

fraction  of  Og  ^  with  a  desired  probability.  For  example,  if 


lltfD  40 


(B. 12) 


k(0,t 


than  10  . 


the  probability  that  J|[M'  ']  Yq||  >  1/10  a*  ^  is  less 

If  l)/^1  Yn||  is  specified  to  be  less  than  1/a  erg  .  ,  ^  the  magnitude 

2  2  ^ 
of  the  change  in  E[x^(i)  Y  ( i )  1  as  n  will  be  less  than 


/ (:  J  y2<‘>  "<tJ  -  E[s?(i>L  yJ(1) 


I  n=» 


] 


*  (i)  |  dF(Yoo) 


f  2  V 

a_i_  Jt(i,  2^*, 


(i) 


n*® 


«  ( i ) 


(B.13) 
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Using  Eqs.  (b.3)  and  (B.4)  and  noting  that  a^  §  1,  the  magnitude  of 
the  difference  is  less  than 


!41(i)  +  !^1(i)Vi)  (B-14) 

A  similar  expansion  shows  that  the  magnitude  of  E[$^(i)Y(i)  •  ( j ) Y ( j ) ] 

is  changed  by  less  than 


2  2 

u  \(i) 


(B. 15) 


The  techniques  of  Roe  and  White  [Ref.  15]  may  be  used  to  determine  the 
correlator  variance,  and  Eqs.  (B.14)  and  (B..  15)  may  be  used  to  bound 
the  total  change.  The  two  methods  may  be  compared  to  decide  whether 
the  ratio 

'  ik**1  >11 

_  r1* 

2 

and  thus  ,  is  small  enough.  These  methods  will  probably  give  an 
unduly  pessimistic  answer.  Usually,  a  ratio  of  1/10  will  be  sufficient. 


SEL-  64- 131 


76  - 


-  - 


APPENDIX  C.  DERIVATION  OF  THE  FORM  OF  THE  NEAR-OPTIMUM 
FILTER  FOR  CONTINUOUS  PARAMETER  PROCESSES 

This  appendix  contains  the  derivation  of  the  equations  for  an 
adaptive  filter  which  may  be  synthesized  from  many  narrowband  filters. 
This  filter  will  have  a  particularly  simple  adaptive  procedure.  Consider 
the  system  of  Fig.  8.  The  are  ideal  nonoverlapping  filters  Af  in 

bandwidth  which  completely  fill  the  bandpass  -B  to  B;  the  G0^  are 
the  optimum  smoothing  filters  following  the  G^.  Appendix  D  shows  that 
the  error  for  a  steady-state  smoothing  filter  is 


and  R8(t)  is  the  autocorrelation  function  of  the  signal;  1  is 

the  Laplace  transform  of  the  inverse  Fourier  transform;  sBS(*)  is  the 

signal  power  spectrum,  and  sli({)  to  the  spectrum  of  the  signal  plus 

the  noise.  Assusw  that  £  G,  equals  an  ideal  filter  of  bandwidth  B 
.  i  1 

(equal  to  unity  over  the  frequency  range  of  interest). 

Now,  examine  the  error  in  the  1th  channel  of  Fig.  8.  The  signal 
and  noise  spectra  are 


S 

ss 


S 

nn 


=  S  I  G  1  2 

U)  “  1 

(C.3) 

«  S  !G  I2 

(C.4) 

(0  “  1 
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Assume  no  correlation  between  signal  and  noise.  Then 


and 


S.  *  |G.  I2  [S  +  S  ]  =  I G.  1  2  S 

ii^  1  i1  ss  nnJ  1  i1  ii 

(C.5) 

S  -  -  0*  s  - 

Xi(i)  i  ii 

(C.6) 

S  +  =  G  S  + 

ii(i)  i  ii 

(C.7) 

(C.8) 


FIG.  8.  BLOCK  DIAGRAM  OF  NARROWBAND  PARALLEL  FILTER  SYSTEM. 


Since  there  is  no  crosscorrelation  between  channels,  the  total  error 
for  all  channels  is 


2 

®T 


(C.9) 
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The  first  integral  on  the  right  is 


°o  n 

f  s„  2  t°ii2  =  *.(o) 

J  i=l 


(c.io) 


Now  evaluate  the  second  integral: 

2  n  n 


|£rlfe°‘!|  <c-u: 

Let  Ai  be  small  so  that  S  (f)  and  S  (f)  are  constant  over  the 

ss  nn 


interval .  Then 


W 


- —  h-»(0i) - six —  o 


(C.l») 


where 

Also, 


S  (f  )  and  N  (f  )  are  the  spectral  densities  in  the  interval. 


Note  that  if 
G*(f)  =  0. 


W 


(C.13) 


0*(f )  t  0,  then  G  (f)  ■  0;  and  if  G^f)  +  0,  then 


ib  f  d•'o•  <cm) 

-j" 

Assume  that  S  ,  S  .  and  G,  can  be  expressed  as  the  ratio  of 

ss  ii  i 

polynomials  (they  can  always  be  approximated  as  closely  as  desired  by 
such  a  ratio).  Then  the  partial  fraction  expansion  can  be  made  as 
follows: 
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11  &  M  m 


mi 


S  -  b 


(C. 15) 


mi 


where  a^  3  0  and  bmi  3  0.  The  right-hand  sum  has  an  inverse  Fourier 
transform  that  is  zero  for  t  g  0.  So, 


i=l  '  “  1  i=l  l  U 


(C. 16) 


Thus , 

FS 


■fel  •  »-fe  i  •  ■>"[  i  £  •■!  ■  i  •.) 


(C. 17) 


Using  Eqs.  (C.ll),  (C.14),  and  (C.17),  one  can  see  that  the  second 
integral  of  Eq.  (C.9)  is 


-joo 

Also  note  that 


ds  (C.18) 


77  & 


i  W  °i  w 

[H  (f  )  +  S  (f  )]^  "  +  So^fi^ 

L  o  i  o'  i  _ 


*  VVV  *  V*!*1' 

(C.l») 

Figure  9  and  Eqs.  (C.9),  (C.18),  and  (C.19)  shoe  that  if  the  outputs 
of  the  parallel  filters  are  weighted  by 


8q(V  Si 

M0(fi)  ♦  So(f1)  *  Mt  ♦  St 
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c.  Equivalent  of  Fig.  9b 


F10.  9.  DIAGRAM  FOR  FINDING  THE  ERROR  OF  THE  PARALLEL  FILTERS. 
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where  8^  and  are  signal  and  noise  powers  [see  Eq.  (C.20)],  and 

summed,  the  resulting  transfer  function  is  Wiener  optimum.  If  a  number 
of  such  narrowband  filters  can  be  constructed,  and  if  estimates  of  the 
signal  and  noise  powers  from  the  narrowband  filters  are  made,  then 
weighting  potentiometers  with  zero-to-one  ranges  can  be  adjusted  to  give 
an  adaptive  Wiener  filter. 

Shortly,  it  will  be  shown  how  Fig.  10  may  be  modified  to  use  nonideal 
filters.  But,  first,  the  effects  of  a  misadjustment  of  the  potentiometers 
will  be  investigated.  A  reasonable  design  objective  would  be  an  adaptive 
filter  that  was  some  specified  decibels  worse  than  optimum.  For  example, 
one  might  want  the  ratio  of  the  mean-squared  error  in  each  channel  to 
the  average  signal  power  per  channel  to  be  "within  10  percent  of  that 
obtainable  if  the  signal  spectrum  were  known  exactly."  Let  K  be  the 
nonideal  potentiometer  setting.  The  mean- squared  error  with  this 
setting  is 


2 

e 


+  k2s 


ii 


ES^KllGil2  df 


(C.20) 


Again  assuming  nearly  constant  spectra, 

Af  Sit  «  S  +  H 

Af  S  «  S 

SB 

(The  ith  channel  is  being  considered,  and  for  convenience  the  subscript 
i  will  be  dropped.)  If  eQ  is  the  minimum  mean-squared  error,  define: 

rrs]  <C-S1> 

Then,  solving  Eq.  (C.21)  for  K  gives 

S  -  [D  (S  +  N)]'* 

* - sTk -  ('•”> 
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and 


K 


[Dq(S  +  N)]y* 
S  +  N 


S 


S  -  6 


S  +  N 

A 

S 


•  €  +  N 

A, 

S  +  € 


S  +  N 


+  N  S  +  el  +  N 


6  >  0 


6  <  0 


(C. 23) 


S  + 


+  N 


where  €  is  the  measurement  error  of  S,  and  N  is  assumed  to  be 
known  exactly  a  priori  (this  is  reasonable  since  there  is  usually  an 
arbitrarily  long  time  to  determine  N).  Then 


D 

o 


5 


2 


e _ _ 

S  +  N 


(C. 24) 


fi(t) 


o 


b.  Error  spectrum  due  to  noise 


c.  Error  spectrum  due  to  signal 


FIO,  10.  THE  BLOCK  DIAGRAMS  FOR  FINDING  THE  ERROR  SPECTRUM  OUT 
OF  THE  ith  CHANNEL. 


83  - 


S  EL- 64- 131 


The  foregoing  analysis  has  assumed  ideal  bandpass  filters,  i.e.,  no 
crosscorrelation  between  filters.  In  practice,  of  course,  it  is  not 
possible  to  construct  ideal  filters.  Assume  that  the  1th  filter  is 
not  ideal  but  is  sharp  and  overlaps  only  its  adjacent  neighbors.  Then 
the  additional  cross-power  error  due  to  the  ith  channel  is  (see  Fig.  10) 


•00 


(C. 25) 


Equation  (C.25)  provides  a  very  simple  swans  for  detensining  how  sharp 
the  filter  cutoff  should  be. 

There  are  now  three  conditions  on  the  n  narrowband  filters  that 
~S  ~S 

will  allow  e^  to  approach  e  arbitrarily  cloaely: 

1.  The  overlap  an  arbitrarily  small  amount , 

n 

2.  ^  G1(ju)  *  Ke  '*wT,  where  K  is  a  constant  and  t  is  the  delay 
1*1 

through  each  Gi  over  the  frequency  range  of  interest. 

3.  The  bandwidth  of  is  arbitrarily  sswll,  or  the  spectra  into  G^ 

are  nearly  constant. 

A  synthesis  procedure  to  swet  these  conditions  will  now  be  described. 
If  each  Gi  is  to  cover  the  frequency  range 

-  (i  -  1)  to  -  i 
n  n 
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the  0^  can  be  formed  by  taking  the  difference  between  two  lowpass 
filters  with  identical  linear  phase  shift  or  delay;  i.e., 

Gi  *  e",T  (H4  -  Hi_1)  (C.26) 

where  e  H^s)  passes  from  zero  to  (B/n)i,  and  is  the  transfer 

function  of  a  zero-shift  sharp-cutoff  filter.  Obviously, 


n 


I 


(C. 27 ) 


thereby  fulfilling  condition  2. 

If  is  ideal,  its  impulse  response  is 


■>,(«) 


(C.28) 


The  transfer  function  e  H^s)  can  be  made  realizable  by  truncating 
hA(t)  at  ±T.  Then  the  impulse  response  of  the  nonideal  HjL(s)  is 


ht(t) 


(C.29) 


where 


•  t  s  t  £  T 

elsewhere 


The  transform  of  b(t)  is  a  sine  function;  taking  the  transform  of 
hi(t),  it  is  seen  that  H^s)  is  the  convolution  of  the  sine  function 
with  the  ideal  lowpass  filter  (see  fig.  11).  Then,  in  the  vicinity  of 
2*(B/n)i , 


(C.30) 
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The  integral  is  tabulated,  giving  a  simple  method  for  determining  the 
delay  needed  for  a  specified  sharpness  of  cutoff. 

A  bandlimited  filter  with  an  impulse  response  2t  long  may  be 
constructed  with  a  tapped  delay  line  and  a  lowpass  filter  [Ref.  16]  as 
shown  in  Fig.  12.  The  setting  of  the  potentiometer  on  the  jth  tap, 
aij,  when  synthesizing  e  H^s)  is 


,  (JT  -  T)] 
“ij  ”  fl(jT  -  T) 


(C. 31 ) 


A  practical  form  of  the  adaptive  filter  is  shown  in  Fig.  13. 


(C.32) 


th 

The  potentiometers  in  the  i  row  correspond  to  the  of  Fig.  10, 

and  the  signal  and  noise  measurements  are  made  at  the  output  of  GQ^. 
However,  if  those  measurements  are  made  at  the  output  of  some  other 
narrowband  filter,  a  great  simplif ication  of  the  adaptive  filter  is 
possible.  Block  diagram  substitution  will  reduce  the  form  of  the 
adaptive  filter  shown  in  Fig.  13  to  the  fora  shown  in  Fig.  12,  where 
the  jth  potentiometer  setting  is  given  very  simply  by 


•,-x 


S.  +  N. 


(C.33) 
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FIG.  12.  SYNTHESIS  OF  e‘8T  Ht  AND  OF  THE 
COMPACT  FORM  OF  THE  ADAPTIVE  FILTER. 


FIG.  13.  A  PRACTICAL  FORM  OF  THE  ADAPTIVE  FILTER. 


If  the  filtering  operations  are  to  be  performed  digitally,  one  nay 
go  froa  the  delay-line  transfer  function  to  numerical  techniques  in  an 
almost  trivial  manner.  If  f(mT)  is  the  sample  value,  at  t  *  mT,  of 
the  output  of  the  lowpass  filter  of  Fig.  12,  then  the  sample  value  at 
the  output  of  the  sum  of  the  weighted  taps  is  given  by 

k 

o 

c(nT)  -  ^  a^f^n  *  J)T1  (C.34) 

J-0 

Since  c(t)  is  bandllmlted,  knowing  c(nT)  gives  all  Information  about 
c(t).  It  is  also  obvious  how  the  set  of  difference  equations  appropriate 
to  the  circuit  in  Fig.  13  is  used. 
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In  practice,  the  spectra  will  not  necessarily  be  constant  over  the 
range  of  Af .  However,  each  channel  may  be  considered  as  the  sum  of  an 
infinite  number  of  infinitely  narrow  channels — each  with  the  same 
potentiometer  setting,  G0^.  Obviously,  this  potentiometer  setting  will 
not  be  as  good  for  some  of  these  channels  as  for  others.  The  equations 
developed  in  the  estimation  part  of  this  appendix  may  be  used  to  determine 
Af  under  an  assumption  of  the  possible  slopes  of  spectra  to  be  encountered. 
The  spectra  can  vary  considerably  from  a  constant  value  with  little  harm. 


88 


SSL- 64- 131 


APPENDIX  D.  STEADY-STATE  ERROR  IN  A  WIENER  FILTER 

For  independent  signals  and  noise,  the  mean- squared  error  from  an 
optimum  filter  is  [Ref.  17] 

00 

•2-  j  *i°.i2sii-[0o0d  +  oX]s..}  <D1> 

-  00 

where 


S 

ss 

S 


nn 


signal  spectrum 


noise  spectrum 


S  +  S  -  S  ♦  S 
ss  nn  ii 


0,  t  <  0 


ii 


desired  operation 

optimum  filter  given  by  0^(8) 


Then 
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Let 


and 


then 


'■w- 


By  Parseval's  theorem, 


(D.3) 


(D.4) 


(D.5) 


Then 


(D.7) 
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APPENDIX  E.  THE  STEADY-STATE  MINIMUM-MEAN-SQUARED- ERROR 

SAMPLED- DATA  FILTER 


Assume  the  filter  is  to  be  of  the  form  shown  in  Fig.  13  and  it  is 
desired  to  adjust  the  a^  to  give  minimum  mean- squared  error.  The 
Z-transform  of  the  tapped-delay-line  filter  is 

k 

o 

G(Z)  =  ^  a„  Z“n  (E.l) 

0 

The  equation  for  the  mean-squared  error  is  well  known  [Ref.  18]  and  is 


•2  ■  Rx<0>  ♦  2 h  <f>  jwz>[  2  *„  2  *.  *■] 

ro 

k 

-  2  *n(Z'”+f  *  }  f  <*•■> 

n=0  ' 


_^T 

The  desired  operation  on  the  signal  is  e  .If  it  is  desired  to  solve 
for  the  filter  giving  the  best  estimate  of  the  delayed  signal,  i  is  a 
positive  integer.  If  the  best  predictor  is  desired,  £  is  a  negative 
integer.  The  term  R  (t)  is  the  signal  autocorrelation  function;  S  (z) 

X  AT  W 

is  the  sampled  noise  plus  signal  spectrum;  sx(z)  is  the  sampled  signal 
spectrum;  and  the  integration  is  around  the  unit  circle.  Uncorrelated 
signal  and  noise  are  assumed . 

The  optimum  setting  for  a.  may  be  found  by  setting  the  partial 
“5  1 

derivative  of  e  with  respect  to  a^  equal  to  zero  and  solving  for  a^. 


k 


o 


(E.3) 
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where,  for  simplicity,  i  =  0.  Parseval's  theorem  for  discrete  systems  is 


13  </>  F1<Z)  VZ"l>  f  -  13  ^  P1<Z_1>  P2(Z> 


dZ 

Z 


£  f 1(nT)  f2(nT) 


(E.4) 


Then  a  solution  to  Eq.  (E.3)  is 


T 

2nj 


K 

j>  f  w*>  i  *»  z“°  - ^ 


2nJ  ^  VZ>Z‘f-° 


(*.») 


Again,  using  Parseval's  theorem, 


23  f  <WZ>  i  \  z“°  ■  I 

•L  0  n-0 


E  [(n-i)T]a 
xt-wlv  J  n 


(*.«) 


and  taking  the  inverse 


1  dZ  M1T) 

8X(Z)  «‘  f  -  J V- 


(1.7) 
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we  have  (kQ+l)  equations  of  the  form 

k 

o 

S  *»  ■  M1T>  <E-“) 

n*0 

which  can  be  expressed  in  matrix  form  as 


R  (0) 

X+W 

R  (T) 
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R  (2T) 
x+w' 

R  (k  T) 
x+w'  o 

m  ■ 

*0 

R  (0) 

R  (T) 

X+W 

R  (0) 
x+w 

R  (T) 
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*1 

R  (T) 
x 

R  (2T) 
x+w' 

• 

R  (T) 
x+w 

• 

R  (0) 

x+w 

• 

• 

• 

*2 

at 

R  (2T) 
x 

• 

• 

• 

R  (k  T) 
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• 

• 

•  • 

• 

•  •  • 

R  (0) 
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